Introduction
The Process Manager is the component responsible for launching and
controlling processes in the ATLAS/TDAQ system. Since
tdaq-01-07-00
it replaces the previous
pmg
package. The documentation describing the Process Manager
functionalities and features can be found in
doc/Design.doc.
A detailed description of all the classes can be found in
doxygen.
General Changes
- Any client application has to explicitly link the libraries processManagerClient and processManagerCore;
- The header files of the ProcessManager server part are no more
installed;
- Some constant definitions are moved to the defs.h header to make the code
cleaner;
- Removed the following no more used tests: LauncherTest, RMBridgeTest, SingletonQueryTest;
- ERS_WARNING macros removed everywhere (appropriate issues are
built);
- Removed the ctime
system calls becasuse not thread safe. Replaced with boost classes;
- Libraries have been a little ri-organized. Now 5 libraries
exists: libprocessManagerClient.so,
libprocessManagerCore.so, libprocessManagerDaemon.so, libprocessManagerLauncher.so, libprocessManagerServer.so.
Server Changes and Fixes
- boost mutexes and
threads replace the DFThread
implementation;
- The Partition object
is removed once the partition is no more present in IPC and all the started processes
in that partition exited;
- The Agent does not publish in IS for partitions in which no more
processes are running;
- The partition directory containing FIFOs and manifests is now removed
once the Partition object
itself has been deleted;
- Fixed a possible problem causing the RM resources to not be freed;
- Fixes to reduce the probability to have a lookup returning a
handle for a process which already exited (anyway the
lookup->get_process->status chain is recommended):
- If something goes wrong while starting a process the Application object is tagged as
invalid;
- The Application
object is now removed once the report thread exits (i.e., the associated process
exited). Previously the Application
object was removed once a new Application
was started;
- If the Application
is in the REQUESTED state for a period greater than a fixed timeout
(default is 120 seconds) the Application
is set invalid;
- If the pmglauncher is not started after the defined timeout
(default is 5 seconds) only a warning is sent. This avoids receiving
unexpected callbacks;
- Better handling of the situation in which the report thread
can not be started;
- Fixed problem when calling select
on the report FIFO;
- The directory where FIFOs
and manifest will be placed can now be passed to the ProcessManager via
an env variable: TDAQ_PMG_MANIFEST_AND_FIFOS_DIR;
- The manifest size has been increased to 32KB;
- The manifest is unmapped/unlinked if an exception occurs in
the Daemon::request_start() method;
- Fixed method to check if the pmglauncher is alive when running
in super-user mode (this was
causing the thread reading on the report FIFO to exit unexpectedly);
- Fixed misleading
error message in case of a failure when starting a process (see bug
#24946 reported in Savannah: error messages stating both ERROR and
SUCCESS);
- Added methods to get a list of all the running processes (or
within a partition);
- Better implementation of actions killing multiple processes (kill_partitition, kill_all);
- Dumping the manifest content in case of errors while trying to
start a process;
- Looking in all the directory structure (i.e., /tmp/ProcessManager/<user
name>) when trying to find existing manifests and FIFOs and
reconnecting to already running launchers.
Client Interface Changes and
Fixes
- DFMutex mutex
implementation has been replaced by the boost one;
- Now the Process::unlink()
method can be called by the Process
object even within the callback. In the previous implementation this
caused a deadlock. The Process::unlink()
method un-registers the callback and it could be called once a callback
is received because the process exited;
- The Process object is
removed internally (by a separate thread) once the process is no more
running and in unlinked. So consider the unlink as a delete. Once you unlink you should
no more use that Process
pointer;
- The Process::is_running()
method exited(). This is
better since the method checks if the process in in one of the end
states;
- The p_state_t enum
(contating all the process states) has been removed. There is no need
to have a second enum since it is already defined in idl. The defs.h header file contains some
useful typedef;
- The structure describing the process status is now initialized
as soon as the process is started. Previously this was done when the
first callback arrived. This is much safer;
- The only Process
class constructor takes a Proxy
as the only arguments. In this way the Proxy must be already there when a
new Process is created;
- The ProcessDescriptionList
class has been removed;
- The Singleton::start_finished
method now builds the Process
object too;
- Proper exceptions are raised in case of error while sending
signals to processes;
- Added two more lookup
methods to the Singleton class
allowing to ask all the agents published in IPC or only a list of them;
- Added askRunningProcesses
methods to the Singleton class to get list of process runnin on a host
or within a partition;
- Moved some periodic messages from WARNING to DEBUG.
Launcher Changes and Fixes
- Fixed process out/err files permissions;
- Fixed problem on directory permission checking.
New Helpers
Usage:
pmg_dump_manifest -M manifest
Options/Arguments:
-M manifest manifest file
Description:
List the contents of the specified manifest file.
Usage:
pmg_list_on_host -H hostname
Options/Arguments:
-H hostname host name
Description:
Lists all the process running on a host.
Usage:
pmg_list_partition [-H hostname] -p partition
Options/Arguments:
-H hostname Host name
-p partition Partition name of the processes to kill
Description:
Lists all the process running within a partition (if the host is
defined only processes running on that host will be listed).
Usage:
pmg_start_app -h Hostname -b
BinaryName -n ApplicationName
-p Partition -w WorkingDirectory [-l LibraryPath]
[-e ProcessEnvironment] [-d LogDirectory]
[-i StandardInput] [-s SoftwareObject] [-x]
[-t Timeout] [-a Arguments ...]
Options/Arguments:
-h
Hostname
Host the application should be started on
-b
BinaryName
Name of the binary to run
-n ApplicationName Name to
associate with the process (can be
different than the binary name)
-p
Partition
Name of the partition
-w WorkingDirectory Directory where
the process will be executed
-l
LibraryPath The
path to append to LD_LIBRARY_PATH (use the standard ':' directory
separator)
-e ProcessEnvironment Additional environment
variables to pass to the
process (i.e.: Value1=Key1#Value2=Key2)
-d
LogDirectory Directory
to write process logs (default is
/tmp)
-i StandardInput
The redirection of stdin (default is /dev/null)
-s SoftwareObject The
swobject describing the process: empty string (default) -> DO not
use RM; Not empty string
-> USE RM
-x
Whether to write out the log at all (default is FALSE)
-t
Timeout
Timeout for the pmg sync mechanism (default is "0": do not use the sync)
-a
Arguments
Arguments to pass to the process
Description:
Asks the
ProcessManager server running on a certain host to start a defined
application.
Usage:
pmg_kill_app -n ApplicationName -p Partition [-t
KillSoftTimeout] [-h Hostname]
Options/Arguments:
-n ApplicationName The name of the application
to kill
-p
Partition Name of the
partition
-t
KillSoftTimeout Timeout to pass to the kill_soft method
(default is 10 s)
-h
Hostname Host the
application should be started on
Description:
Asks the ProcessManager server to terminate a defined application.
Notes
If you need to run multiple
pmgservers
in different IPC domains under the same user then you have to run each
pmgserver
using a different value for the TDAQ_PMG_MANIFEST_AND_FIFOS_DIR
environment variable. This avoids all the
pmgservers
to use the same area for FIFOs and manifests.