This module represents the node manager, a service to control the formation of clusters of worker nodes based on their properties. More...
Classes | |
class | mi::neuraylib::IWorker_node_descriptor |
This interface describes a worker node and its properties. More... | |
class | mi::neuraylib::ICluster_descriptor |
This interface describes a cluster and its properties. More... | |
class | mi::neuraylib::ICluster_property_callback |
Abstract interface for signaling changed cluster properties. More... | |
class | mi::neuraylib::IWorker_node_property_callback |
Abstract interface for signaling changed worker node properties. More... | |
class | mi::neuraylib::IClient_node_callback |
Abstract interface for signaling changed cluster members. More... | |
class | mi::neuraylib::IWorker_node_callback |
Abstract interface for signaling changed cluster members. More... | |
class | mi::neuraylib::IHead_node_callback |
Abstract interface for signaling a change of the cluster application head node. More... | |
class | mi::neuraylib::IShutdown_node_managers_callback |
Abstract interface for signaling a request to shutdown all clients and workers. More... | |
class | mi::neuraylib::IShutdown_cluster_callback |
Abstract interface for signaling a request to shutdown a cluster. More... | |
class | mi::neuraylib::IWorker_process_started_callback |
Abstract interface for indicating that a worker process has been fully started. More... | |
class | mi::neuraylib::INode_manager_cluster |
The interface to a cluster created and managed by the node manager. More... | |
class | mi::neuraylib::ICluster_filter |
A filter used to decide whether a cluster is eligible to be joined. More... | |
class | mi::neuraylib::IWorker_node_filter |
A filter used to decide whether a worker node is eligible to be included in a cluster. More... | |
class | mi::neuraylib::INode_manager_client |
The node manager client allows to start or join DiCE clusters built from worker nodes. More... | |
class | mi::neuraylib::IChild_process_resolver |
A filter used to decide if a command string to start a child process is eligible for execution. More... | |
class | mi::neuraylib::INode_manager_worker |
The node manager worker class allows to set properties and announce them to other nodes. More... | |
class | mi::neuraylib::INode_manager_factory |
Factory to create node manager client and worker instances. More... | |
This module represents the node manager, a service to control the formation of clusters of worker nodes based on their properties.
The node manager is part of the DiCE library and can be used by any application integrating DiCE. In the following a client is an application based on DiCE which wants to make use of additional worker nodes to offload work. The node manager allows to allocate and manage those worker nodes.
For using the node manager, a node manager process must be running on the worker nodes to be used by client applications to delegate work to them. This process running on the worker nodes can be built based on the DiCE library, too. This library offers an API which allows to register properties at runtime, including the possibility to change them dynamically. The node manager process running on the worker nodes can for example detect local capabilities, e.g., the number of available CPU cores, the number of GPUs, or the amount of physical memory present and set them as properties of the worker node. Those and other arbitrarily chosen properties will be announced by DiCE to the client nodes.
On the client nodes, the node manager API can be used to control formation and/or joining clusters of worker nodes. This can happen before the start of the DiCE library and also later, in order to add a cluster of worker nodes to a running application or to join an already running cluster.
The application running on the client nodes has full control over which cluster to join or which worker nodes to select for the formation of a cluster. This can be achieved by writing a custom filter class to which DiCE offers eligible clusters respectively worker nodes along with their properties which have been set by the node manager process running on the worker nodes. Such a filter can then return either true
or false
. True
is returned if the cluster respectively worker node in question should be chosen, or false
, otherwise. In addition a client application can specify a minimum and maximum amount of worker nodes which need to be in the cluster for the cluster creation to be successful.
Each cluster created using the node manager API is associated with a multicast address which is automatically chosen and which can be passed to DiCE for forming a DiCE cluster. In addition to that a command string which is used to start child processes on the worker nodes is associated with the cluster.
A cluster can be shut down automatically when no client is using the cluster anymore. Shutting down can also be delayed by a timeout which can be set by the client application. In addition it is possible to shut down a cluster immediately, even if there are still client nodes using the clusters or the timeout has not elapsed.
The node manager API allows a client node to form or join any number of clusters at the same time or at different times.
The node manager can be operated in two network modes: multicasting and TCP networking with a discovery host. Multicasting is the default. TCP networking can be used in network environments where switches/routers do not allow UDP multicasting and establishing a connection between node manager instances does not work. With TCP networking, a head node is used to allow node manager instances to find each other. There can be only one head node and it needs to be the first instance that is started. A node manager instance that is started in TCP mode and where the address that follows is the local IP address will become the head node. Other nodes specify the head node's IP address as well and will obtain the list of known nodes from there.
Keepalive PDU for the child process watchdog
struct keepalive { int type; int sequence_number; };