.\" Copyright (C) 1994-2018 Altair Engineering, Inc.
.\" For more information, contact Altair at www.altair.com.
.\"
.\" This file is part of the PBS Professional ("PBS Pro") software.
.\"
.\" Open Source License Information:
.\"
.\" PBS Pro is free software. You can redistribute it and/or modify it under the
.\" terms of the GNU Affero General Public License as published by the Free
.\" Software Foundation, either version 3 of the License, or (at your option) any
.\" later version.
.\"
.\" PBS Pro is distributed in the hope that it will be useful, but WITHOUT ANY
.\" WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
.\" FOR A PARTICULAR PURPOSE.
.\" See the GNU Affero General Public License for more details.
.\"
.\" You should have received a copy of the GNU Affero General Public License
.\" along with this program. If not, see .
.\"
.\" Commercial License Information:
.\"
.\" For a copy of the commercial license terms and conditions,
.\" go to: (http://www.pbspro.com/UserArea/agreement.html)
.\" or contact the Altair Legal Department.
.\"
.\" Altair’s dual-license business model allows companies, individuals, and
.\" organizations to create proprietary derivative works of PBS Pro and
.\" distribute them - whether embedded or bundled with other software -
.\" under a commercial license agreement.
.\"
.\" Use of Altair’s trademarks, including but not limited to "PBS™",
.\" "PBS Professional®", and "PBS Pro™" and Altair’s logos is subject to Altair's
.\" trademark licensing policies.
.\"
.TH pbs_mom 8B "8 November 2017" Local "PBS Professional"
.SH NAME
.B pbs_mom
- run the PBS job monitoring and execution daemon
.SH SYNOPSIS
.B pbs_mom
[-a ]
[-C ]
.RS 8
[-c ]
[-d ]
[-L ]
.br
[-M ]
[-N]
[-n ]
[-p|-r]
.br
[-R ]
[-S ]
.br
[-s script_options]
.RE
.B pbs_mom
--version
.SH DESCRIPTION
The
.B pbs_mom
command starts the PBS job monitoring and execution daemon, called
MoM.
The standard MoM starts jobs on the execution host, monitors and reports
resource usage, enforces resource usage limits, and notifies the
server when the job is finished. The MoM also runs any prologue
scripts before the job runs, and runs any epilogue scripts after the
job runs.
The MoM performs any communication with job tasks and with other MoMs.
The MoM on the first vnode on which a job is running manages
communication with the MoMs on the remaining vnodes on which the job
runs.
The MoM manages one or more vnodes. PBS may treat a host as
a set of virtual nodes, in which case one MoM
manages all of the host's vnodes. See the
.B PBS Professional Administrator's Guide.
.B Logging
.br
The MoM's log file is in PBS_HOME/mom_logs. The MoM writes an
error message in its log file when it encounters any error. If it
cannot write to its log file, it writes to standard error. The
MoM writes events to its log file.
The MoM writes its PBS
version and build information to the logfile whenever it starts up or
the logfile is rolled to a new file.
.B Required Permission
.br
The executable for
.B pbs_mom
is in PBS_EXEC/sbin, and can be run only by root on Linux, and Admin
on Windows.
.B Cpusets
.br
A cpusetted machine can have a "boot cpuset" defined by the
administrator. A boot cpuset contains one or more CPUs and memory
boards and is used to restrict the default placement of system
processes, including login. If defined, the boot cpuset contains
CPU 0.
Run parallel jobs exclusively within a cpuset for repeatability of
performance. HPE SGI states, "Using cpusets on an HPE SGI system improves
cache locality and memory access times and can substantially improve
an application's performance and runtime repeatability."
The CPUSET_CPU_EXCLUSIVE flag prevents CPU 0 from being used by
the MoM in the creation of job cpusets. This flag is set by default,
so this is the default behavior.
To find out which cpuset is assigned to a running job, use
.B qstat
-f
to see the
.I cpuset
field in the job's
.I altid
attribute.
.B HPE SGI Machine Running Supported Versions of Performance Software -
Message Passing Interface
.br
The cpusets created for jobs are marked cpu-exclusive.
MoM does not use any CPU which was in use at startup.
A PBS job can run across multiple machines that run supported versions
of Performance Software - Message Passing Interface.
PBS can run using HPE SGI's MPI (MPT) over InfiniBand. See the
.B PBS Professional Administrator's Guide.
.LP
.B Effect on Jobs of Starting MoM
.br
When MoM is started or restarted, her default behavior is to leave
any running processes running, but to tell the PBS server to requeue
the jobs she manages. MoM tracks the process ID of jobs across
restarts.
In order to have all jobs killed and requeued, use the
.I r
option when starting or restarting MoM.
In order to leave any running processes running, and not to requeue
any jobs, use the
.I p
option when starting or restarting MoM.
.SH OPTIONS
.IP "-a " 10
Number of seconds before alarm timeout.
Whenever a resource request is processed, an alarm is set for the
given amount of time. If the request has not completed before
.I alarm timeout,
the OS generates an alarm signal and sends it to MoM.
Default: 10 seconds. Format: integer.
.IP "-C " 10
Specifies the path of the directory where MoM creates job-specific
subdirectories used to hold each job's restart files. MoM passes this
path to checkpoint and restart scripts. Overrides other checkpoint
path specification methods. Any directory specified with the
.I -C
option must be owned, readable, writable, and executable by root only
.I (rwx,---,---, or 0700),
to protect the security of the checkpoint files. See the
.I -d
option. Format: string.
.br
Default: PBS_HOME/spool/checkpoint.
.IP "-c " 10
MoM will read this alternate default configuration file upon starting.
If this is a relative file name it will be relative to
PBS_HOME/mom_priv. If the specified file cannot be opened,
.B pbs_mom
will abort. See the
.I -d
option.
MoM's normal operation, when the -c option is not given, is to attempt
to open the default configuration file PBS_HOME/mom_priv/config.
If this file is not present,
.B pbs_mom
will log the fact and continue.
.IP "-d " 10
Specifies the path of the
.I directory
to be used in place of PBS_HOME by
.B pbs_mom.
The default directory is given by $PBS_HOME. Format: string.
.IP "-L " 10
Specifies an absolute path and filename for the log file.
The default is a file named for the current date in PBS_HOME/mom_logs/.
See the
.I -d
option. Format: string.
.IP "-M " 10
Specifies the number of the port on which MoM will
listen for server requests and instructions. Overrides
PBS_MOM_SERVICE_PORT setting in pbs.conf and environment variable.
Default: 15002.
Format: integer port number.
.IP "-n " 10
Specifies the priority for the
.B pbs_mom
daemon. Format: integer.
.IP "-N" 10
Specifies that when starting, MoM should not detach from the
current session.
.IP "-p" 10
Specifies that when starting, MoM should allow any running jobs
to continue running, and not have them requeued. This option
can be used for single-host jobs only; multi-host jobs cannot
be preserved.
Cannot be used with the
.I -r
option.
MoM is not the parent of these jobs.
.RS
.IP "HPE SGI systems running Performance Software - Message Passing Interface" 5
The cpuset-enabled
.B pbs_mom
will, if given the
.I -p
flag, use the existing CPU and memory allocations for the /PBSPro
cpusets.
The default behavior is to remove these cpusets.
Should this fail, MoM will exit, asking to be restarted with the
.I -p
flag.
.LP
.RE
.IP "-r" 10
Specifies that when starting, MoM should requeue any rerunnable jobs and
kill any non-rerunnable jobs that
she was tracking, and mark the
jobs as terminated. Cannot be used with the
.I -p
option.
MoM is not the parent of these jobs.
It is not recommended to use the
.I -r
option after a reboot, because process IDs of new, legitimate tasks
may match those MoM was previously tracking. If they match and MoM is
started with the
.I -r
option, MoM will kill the new tasks.
.IP "-R " 10
Specifies the number of the port on which MoM will listen for pings,
resource information requests, communication from other MoMs, etc.
Overrides PBS_MANAGER_SERVICE_PORT setting in pbs.conf and environment variable.
Default: 15003. Format: integer port number.
.IP "-S " 10
Specifies the port number on which
.B pbs_mom
initially contacts the server. Default: 15001. Format: integer port number.
.IP "-s