SGE Auto 配置的消耗性資源?
我正在使用一個名為 starcluster http://star.mit.edu/cluster的工具在亞馬遜雲中啟動一個 SGE 配置的集群。問題是它似乎沒有配置任何預設的消耗資源,除了 SLOTS,我似乎無法直接使用
qsub -l slots=X
. 每次我啟動一個集群時,我可能會要求不同類型的 EC2 節點,所以這個插槽資源是預先配置的這一事實非常好。我可以使用預先配置的並行環境請求一定數量的插槽,但問題是它是為 MPI 設置的,因此使用該並行環境請求插槽有時會授予分佈在多個計算節點上的作業插槽。有沒有辦法 1) 創建一個並行環境,利用現有的預配置 HOST=X 插槽設置,starcluster 在您請求單個節點上的插槽時設置,或者 2) 使用 SGE 的某種資源是自動知道嗎?執行
qhost
讓我認為,即使NCPU
和MEMTOT
沒有在我能看到的任何地方定義,SGE 以某種方式知道這些資源,是否有設置可以讓這些資源可請求而無需明確定義每種資源有多少可用?謝謝你的時間!
qhost
輸出:HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - master linux-x64 2 0.01 7.3G 167.4M 0.0 0.0 node001 linux-x64 2 0.01 7.3G 139.6M 0.0 0.0
qconf -mc
輸出:#name shortcut type relop requestable consumable default urgency #---------------------------------------------------------------------------------------- arch a RESTRING == YES NO NONE 0 calendar c RESTRING == YES NO NONE 0 cpu cpu DOUBLE >= YES NO 0 0 display_win_gui dwg BOOL == YES NO 0 0 h_core h_core MEMORY <= YES NO 0 0 h_cpu h_cpu TIME <= YES NO 0:0:0 0 h_data h_data MEMORY <= YES NO 0 0 h_fsize h_fsize MEMORY <= YES NO 0 0 h_rss h_rss MEMORY <= YES NO 0 0 h_rt h_rt TIME <= YES NO 0:0:0 0 h_stack h_stack MEMORY <= YES NO 0 0 h_vmem h_vmem MEMORY <= YES NO 0 0 hostname h HOST == YES NO NONE 0 load_avg la DOUBLE >= NO NO 0 0 load_long ll DOUBLE >= NO NO 0 0 load_medium lm DOUBLE >= NO NO 0 0 load_short ls DOUBLE >= NO NO 0 0 m_core core INT <= YES NO 0 0 m_socket socket INT <= YES NO 0 0 m_topology topo RESTRING == YES NO NONE 0 m_topology_inuse utopo RESTRING == YES NO NONE 0 mem_free mf MEMORY <= YES NO 0 0 mem_total mt MEMORY <= YES NO 0 0 mem_used mu MEMORY >= YES NO 0 0 min_cpu_interval mci TIME <= NO NO 0:0:0 0 np_load_avg nla DOUBLE >= NO NO 0 0 np_load_long nll DOUBLE >= NO NO 0 0 np_load_medium nlm DOUBLE >= NO NO 0 0 np_load_short nls DOUBLE >= NO NO 0 0 num_proc p INT == YES NO 0 0 qname q RESTRING == YES NO NONE 0 rerun re BOOL == NO NO 0 0 s_core s_core MEMORY <= YES NO 0 0 s_cpu s_cpu TIME <= YES NO 0:0:0 0 s_data s_data MEMORY <= YES NO 0 0 s_fsize s_fsize MEMORY <= YES NO 0 0 s_rss s_rss MEMORY <= YES NO 0 0 s_rt s_rt TIME <= YES NO 0:0:0 0 s_stack s_stack MEMORY <= YES NO 0 0 s_vmem s_vmem MEMORY <= YES NO 0 0 seq_no seq INT == NO NO 0 0 slots s INT <= YES YES 1 1000 swap_free sf MEMORY <= YES NO 0 0 swap_rate sr MEMORY >= YES NO 0 0 swap_rsvd srsv MEMORY >= YES NO 0 0
qconf -me master
輸出(以其中一個節點為例):hostname master load_scaling NONE complex_values NONE user_lists NONE xuser_lists NONE projects NONE xprojects NONE usage_scaling NONE report_variables NONE
qconf -msconf
輸出:algorithm default schedule_interval 0:0:15 maxujobs 0 queue_sort_method load job_load_adjustments np_load_avg=0.50 load_adjustment_decay_time 0:7:30 load_formula np_load_avg schedd_job_info false flush_submit_sec 0 flush_finish_sec 0 params none reprioritize_interval 0:0:0 halftime 168 usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000 compensation_factor 5.000000 weight_user 0.250000 weight_project 0.250000 weight_department 0.250000 weight_job 0.250000 weight_tickets_functional 0 weight_tickets_share 0 share_override_tickets TRUE share_functional_shares TRUE max_functional_jobs_to_schedule 200 report_pjob_tickets TRUE max_pending_tasks_per_job 50 halflife_decay_list none policy_hierarchy OFS weight_ticket 0.010000 weight_waiting_time 0.000000 weight_deadline 3600000.000000 weight_urgency 0.100000 weight_priority 1.000000 max_reservation 0 default_duration INFINITY
qconf -mq all.q
輸出:qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make orte rerun FALSE slots 1,[master=2],[node001=2] tmpdir /tmp shell /bin/bash prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY
我找到的解決方案是創建一個具有
$pe_slots
分配規則的新並行環境(請參閱 參考資料man sge_pe
)。我將可用於該並行環境的插槽數設置為等於最大值,因為$pe_slots
將插槽使用限制為每個節點。由於 starcluster 在集群啟動時設置了插槽,這似乎很好地解決了這個問題。您還需要將新的並行環境添加到隊列中。所以只是為了讓這個死簡單:qconf -ap by_node
這是我編輯文件後的內容:
pe_name by_node slots 9999999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $pe_slots control_slaves TRUE job_is_first_task TRUE urgency_slots min accounting_summary FALSE
還要修改隊列(
all.q
由 starcluster 呼叫)以將這個新的並行環境添加到列表中。qconf -mq all.q
並更改此行:
pe_list make orte
對此:
pe_list make orte by_node
我擔心從給定作業產生的作業將僅限於單個節點,但情況似乎並非如此。我有一個有兩個節點的集群,每個節點有兩個插槽。
我製作了一個如下所示的測試文件:
#!/bin/bash qsub -b y -pe by_node 2 -cwd sleep 100 sleep 100
並像這樣執行它:
qsub -V -pe by_node 2 test.sh
片刻之後,
qstat
顯示兩個作業在不同節點上執行:job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 25 0.55500 test root r 10/17/2012 21:42:57 all.q@master 2 26 0.55500 sleep root r 10/17/2012 21:43:12 all.q@node001 2
我還測試了一次送出 3 個作業,在單個節點上請求相同數量的插槽,並且一次只執行兩個,每個節點一個。所以這似乎是正確設置!