DiCE API nvidia_logo_transpbg.gif Up
The DiCE API

The Distributed Computing Environment (DiCE) represents a C++ application programming interface that enables parallel distributed computing in a cluster environment.

DiCE encompasses a distributed database and a distributed execution model designed to implement distributed applications and distributed parallel algorithms with very low latency requirements, such as interactive high quality rendering and high performance computing (HPC). The DiCE distributed database leverages the storage available in a possibly large cluster of hosts. The DiCE distributed execution model exploits the compute capability provided by all CPUs and GPUs present in the cluster.

Overview

The DiCE distributed database represents a repository for C++ objects called database elements. DiCE ensures the visibility and consistency of database elements stored in the database using a sophisticated transaction mechanism. The DiCE distributed database automatically supplies hosts in the cluster with the required database elements that they require for efficient and timely processing.

The elaborated parallel execution model relies on execution units called jobs. The built-in job scheduling mechanism efficiently distributes computing tasks to either CPU cores or GPUs available in the a cluster. The jobs mechanism is tightly coupled with the distributed database and, thus, benefits directly from its functionality.

In order to take advantage of the DiCE functionality an application programmer only needs to implement custom classes derived from mi::neuraylib::IElement, mi::neuraylib::IJob, and mi::neuraylib::IFragmented_job and setup a DiCE-based computing service on each host in the cluster. Inter-host communication within the cluster is then managed automatically by the distributed database and the job scheduling system of DiCE. An additional concept called distributed cache (mi::neuraylib::IDistributed_cache) allows application programmers to share and manage user-defined data between hosts in the cluster explicitly.

Transactions

Similar to SQL databases, the DiCE distributed database supports the concept of transactions. A transaction provides a consistent view on the distributed database for the duration of its lifetime: changes done within this transaction and transactions committed before the transaction has been started are visible within the transaction.

For this purpose, a transaction encapsulates:

  • storing new database elements in the database.
  • accessing database elements for reading (called access)
  • accessing database elements for modifying them (called edit)

It should be noted that the distributed database doesn't support a SQL-based query mechanisms.

An open transaction can be finished either by committing the transaction or aborting it. In the latter case, all changes applied within the transaction will never become visible to any other transaction. In the first case, all changed applied within the transaction become visible to transactions that have been started after having committed the transaction. In contrast to SQL databases, even two or more transactions with conflicting changes can all be committed. This is an important requirement, for instance, for renderers: one scene might be viewed from different perspectives. In such a case, the camera (which is represented by a database element) provides different views in different transactions that all need to be committed. The database provides means for synchronization (global locks) which need to be used by application writers to avoid inconsistent data. The above description implies, that transactions can not be nested.

The second main responsibility of a transaction is to distribute and execute tasks, i.e., jobs, in the cluster to enable distributed parallel computing (see mi::neuraylib::IJob, and mi::neuraylib::IFragmented_job).

Both the generation of new transactions and the terminating an existing transactions is usually handled at the application level, i.e., by the application logic. A distributed algorithm usually receives its transaction from the application in order to be able to access required datasets to execute its task. That is, a distributed algorithm never generates or commits/aborts transactions itself.

An application can obtain a new DiCE-specific transaction (mi::neuraylib::IDice_transaction) by means of the following call:

mi::neuraylib::ITransaction::get_interface<mi::neuraylib::IDice_transaction>().

Database Elements

Database elements represent the basic entities that constitute the DiCE distributed database. Database elements encapsulate the data stored in the database. They can be accessed and edited by each and every host in the cluster whereas the transaction mechanism ensures consistency (see above).

Database elements are associated with a unique identifier, a mi::neuraylib::Tag. Each database element stored in the distributed database can be identified by this unique tag. A tag can optionally be associated with a name. The database allows switching between a tag and a name and vice versa.

Each database element must be derived from from mi::neuraylib::IElement in order to be stored in the distributed database. Deriving from this class requires the implementation of a number of member functions, which allows the database to manage database accesses, manipulations, reference counting, and delegation to hosts in the cluster environment. The mi::neuraylib::IElement::copy() is required to support multiple versions of database elements in different scopes and transactions.

Accessing a tag using the mi::neuraylib::IDice_transaction::access() retrieves the database element from the distributed database for reading only. If the database element is not present on the host but on a remote host in the cluster, then the database transfers the database element to the respective host. Using mi::neuraylib::IDice_transaction::edit() returns a copy of the database element, which allow manipulating it. If the pointer or handle to the database elements goes out of scope or is explicitly released then the update is stored in the distributed database and becomes visible to other hosts.

It is important to note that database elements can be accessed by multiple transactions at the same time for reading. C++ allows to have mutable members and also allows to cast away constness. This should be avoided to prevent side effects because multiple transactions from possibly originating from different scopes might access the database element at the same time.

Database Jobs

Database jobs generate database elements as result of the jobs on demand computing. Similar to database elements, database jobs are associated with a unique tag and are stored in the distributed database. In contrast to database elements, database jobs get executed when accessing the tag by mi::neuraylib::IDice_transaction::access(), and then provide a database element as result of the database job execution, that is, a job can be considered a kind of procedural database element. Subsequent accesses to the tag will not (necessarily) re-execute the job but the database will immediately return the database element returned from the execute function

A database job should be introduced if:

  • the creation of the resulting data is not trivial and takes some time to compute,
  • the creation of the data shall be done on demand only if essentially required, and
  • some task can be divided into smaller subtasks (individual jobs), which are independent from one another and can be distributed to the CPUs and GPUs in the cluster.

In addition to triggering on-demand execution of database job by accessing the associated tag it also possible to advise a database jobs for lazy execution. Calling advise() on a job's tag tells the database that this job should be scheduled to be executed on some CPU in the cluster as soon as possible. Calling advise() does not block; only the later access to the the job's result blocks if and as long as the result is not ready. Thus, advise() essentially enables jobs to exploit multiple CPUs or GPUs on one host or on many hosts in a cluster, i.e., enables parallelism.

The database has a mechanism to ensure that a job gets up-to-date information when being executed. The guarantee given is: all changes done or seen on a host before doing an mi::neuraylib::IDice_transaction::advise() or mi::neuraylib::IDice_transaction::access() on a job are visible during job execution. This is also ensured if the job is executed on a remotely on a different host in a cluster.

It is also transitive. Consider the following situation: one host changes a camera and then advises a rendering job. The rendering job is executed on a second host. While executing this job, another job gets advised and executed on a third host. In this case, the database ensures that this third host has seen the changes to the camera before starting the job's execution.

Database jobs represent suitable means for generating results that are required to be updated only once per transaction or even once for multiple transactions. Nevertheless, they come with a certain overhead, though. In contrast, fragmented jobs are lightweight means to implement distributed, parallel computing algorithms in a cluster environment.

Fragmented Jobs

Fragmented jobs enable distributed, parallelized computing of compute intensive work using the CPUs and/or GPUs available in a cluster environment. In contrast to database jobs, fragmented jobs are not stored in the database but can be instantiated at any time to be distributed and executed immediately. The DiCE built-in scheduling system splits each instance of a fragmented job into a pre-defined number of fragments each of which executes independently, i.e., without any side effects, and may execute on different CPUs or GPUs. Furthermore, a fragmented job does not return a result explicitly but the collection of the split fragments contributes to the solution of the entire compute problem.

In order to implement a fragmented job a class that derived from mi::neuraylib::IFragmented_job needs to be implemented and instantiated. If the fragmented job is supposed to execute on the local host only, then merely the mi::neuraylib::IFragmented_job::execute_fragment method needs to implement the intended parallelizable operation. If the fragmented job is also supposed to be executed within the cluster environment on a remote host (remote execution of fragments) then the methods mi::neuraylib::IFragmented_job::execute_fragment_remote() and mi::neuraylib::IFragmented_job::receive_remote_result() need to implement the parallelizable accordingly.

Either the mi::neuraylib::IFragmented_job::execute_fragment() or the mi::neuraylib::IFragmented_job::execute_fragment_remote function will be called exactly once for each fragment of the fragmented job. Both receive the index of the respective fragment and the number of fragments into which the job is split. For instance, if a fragmented job, which is not delegated to other hosts, is split in 1000 fragments, the mi::neuraylib::IFragmented_job::execute_fragment() function of this job is called once for each of the indices 0...999. If some of the fragments have been delegated to other hosts then the DiCE built-in scheduler calls the mi::neuraylib::IFragmented_job::execute_fragment_remote() on the remote host for the respective indices of the delegated fragments instead. On the host that initiated the execution of the fragmented job the mi::neuraylib::IFragmented_job::receive_remote_result() method will be called once for each of those indices as well to receive the fragment's computed results.

UUIDs

The new API uses UUIDs to identify C++ classes. This is important especially when instances of those classes are sent to other hosts. In that case the receiving host needs to construct an object of the class which is then filled with the received data.

That means each of the classes directly or indirectly derived from mi::neuraylib::ISerializable needs a distinct UUID. There are various tools available to create UUIDs, including online web services, linux commandline tools (uuidgen) and windows tools coming with the Visual C++ compiler.