QUEUE_NAME:队列名称PRIO:优先级NJOBS:几个作业在并行PEND:阻塞的作业数量RUN:正在运行的作业数量SUSP:挂起的作业数量 4. 常用命令之bhosts 显示各节点作业相关情况 bhosts hostname 5. 常用命令之bjobs 查看提交作业运行情况; bjobs –r 显示正在运行的作业 bjobs –a 显示正在运行的和最近完成的作业 bjobs -p ...
查看节点使用情况,如果RUN列是0,就表示没有该节点没有人使用,MAX是该节点的进程数,status为closed就是不可用状态,Host_name就是节点名 HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_welcome/lsf_kc_cmd_ref.html IBM的command...
MAX 最大可以同时运行的核数 NJOBS 当前所有运行和待运行作业所需的核数 RUN 已经开始运行的作业占据的核数 SSUSP 系统所挂起的作业所使用的核数 USUSP 用户自行挂起的作业所使用的核数 RSV 系统为你预约所保留的核数 lsload查看所有节点的负载、内存使用等 bqueue查看队列信息 -l 查看队列的详细信息 -u 查看...
Use LSF job packs to speed up the submission of a large number of jobs. With job packs, you can submit jobs by submitting a single file containing multiple job requests.
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV linux01 ok - 16 0 0 0 0 0 linux02 ok - 16 0 0 0 0 0 linux03 ok - 16 0 0 0 0 0 linux01,linux02和linux03就是这个LSF集群中的计算节点名字。每一个计算节点上都有对应的计算资源:CPU,内存,磁盘等。
NJOBS 当前所有运行和待运行作业所需的核数 RUN 已经开始运行的作业占据的核数 SSUSP 系统所挂起的作业所使用的核数 USUSP 用户自行挂起的作业所使用的核数 RSV 系统为你预约所保留的核数 lsload查看所有节点的负载、内存使用等 bqueue查看队列信息
If you just want to view the JOBID after submission, most of the time I will just use bhist or bhist -l to view the running jobs and details. $ bhist Summary of time in seconds spent in various states: JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 8664 F14r3 ...
Jobs - 0 - JOB_UNDERRUN代表当作业执行时间小于3分钟,并切是EXIT状态结束,触发eadmin $ bsub -q qq exit 1 Job <1> is submitted to queue <qq>. $ bsub -q qq exit 1 Job <2> is submitted to queue <qq>. $ bsub -q qq exit 1 ...
PRIO就是队列的配置优先级。 $ bqueues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP priority 43 Open:Active - - - - 0 0 0 0 normal 30 Open:Active - - - - 0 0 0 0 idle 20 Open:Active - - - - 0 0 0 0 ...
Jobs terminated with a system signal are returned by LSF as exit codes greater than 128 such that exit_code-128=signal_value. For example, exit code 133 means that the job was terminated with signal 5 (SIGTRAP on most systems, 133-128=5). A job with exit code 130 was terminated with...