Distributed job interface enabling analysis and processing running against distributed subset data. More...
#include <idata_distribution.h>
Inherits mi::base::Interface_declare< 0x841224fe, ... >.
Public Member Functions | |
virtual IDistributed_data_locality_query_mode * | get_scheduling_mode () const =0 |
A data locality query mode specifies which data subsets to process. More... | |
virtual void | receive_data_locality (IDistributed_data_locality *determined_locality)=0 |
Receiving a distributed data locality from the NVIDIA IndeX system. More... | |
virtual void | execute_subset (mi::neuraylib::IDice_transaction *dice_transaction, const nv::index::IData_distribution *data_distribution, nv::index::IData_subset_compute_task_processing *data_subset_processing, mi::Size data_subset_index, mi::Size data_subset_count)=0 |
Local execution of the distributed data job for a processing a data subset. More... | |
virtual void | execute_subset_remote (mi::neuraylib::ISerializer *serializer, mi::neuraylib::IDice_transaction *dice_transaction, const nv::index::IData_distribution *data_distribution, nv::index::IData_subset_compute_task_processing *data_subset_processing, mi::Size data_subset_index, mi::Size data_subset_count)=0 |
Remote execution of the distributed data job for a processing a data subset. More... | |
virtual void | receive_subset_result (mi::neuraylib::IDeserializer *deserializer, mi::neuraylib::IDice_transaction *dice_transaction, const nv::index::IData_distribution *data_distribution, mi::Size data_subset_index, mi::Size data_subset_count)=0 |
Receives the result of a remote distributed data job execution. More... | |
Distributed job interface enabling analysis and processing running against distributed subset data.
The IDistributed_data_job
interface for implementing custom jobs resembles the mi::neuraylib::IFragment_job
interface on purpose. While DiCE's mi::neuraylib::IFragment_job
interface is meant to implement general cluster wide jobs, e.g., compute algorithms, the IDistributed_data_job
interface enables applications to implement algorithms specifically for NVIDIA IndeX's distributed data. Here, the scheduling is based on the data locality, i.e., which cluster node stores which portion of the dataset. The job will then send out one execution thread per subset data so that all data subsets can be processed in parallel either locally or remotely. For local execution, the method execute_subset is invoked to processes a single subset data and for remote execution, the method execute_subset_remote is invoked. The method execute_subset_remote has its local counterpart receive_subset_result in case the distributed data processing analyses the data and derives information from the data that need to be assembled locally. A common use-case for that is the histogram generation.
Technically, the DiCE mi::neuraylib::Fragmented_job
infrastructure is used to facilitate the
distributed data analysis and processing. While the processing of subset data could still be implemented using the mi::neuraylib::Fragmented_job
interface and, for instance, the information provided by the data locality (see IDistributed_data_locality
), the use of the IDistributed_data_job
interface reduces the code complexity drastically and provides immediate access to the data subset through IData_subset_processing_task
and IData_subset_compute_task_processing
interfaces.
|
pure virtual |
Local execution of the distributed data job for a processing a data subset.
The method is invoked by the NVIDIA IndeX system. It facilitates the processing of a single data subset. For processing the data an user-implemented IData_subset_processing_task
can be executed using the method IData_subset_compute_task_processing::execute_compute_task
.
[in] | dice_transaction | The DiCE transaction enables access of the the distributed data store and launching of additional jobs, e.g., to retrieve compute results from different nodes. |
[in] | data_distribution | The IData_distribution interface enables to derive a data locality and schedule new distributed jobs. |
[in] | data_subset_processing | The IData_subset_compute_task_processing interface facilitates the execution of user-implemented data processing tasks (see IData_subset_processing_task ). |
[in] | data_subset_index | The data subset ID that corresponds and uniquely identifies the data subset that this execution call processes. |
[in] | data_subset_count | The total number of data subsets that are processed by the distributed data job. |
|
pure virtual |
Remote execution of the distributed data job for a processing a data subset.
The method is invoked by the NVIDIA IndeX system on a remote node. It facilitates the processing of a single data subset. For processing the data an user-implemented IData_subset_processing_task
can be executed using the method IData_subset_compute_task_processing::execute_compute_task
.
[in] | serializer | The serializer allows the application to communicate computed results, e.g. a mean value, back to the calling job instance. The correspondent receive_subset_result receives the returned result in its deserializer steam. |
[in] | dice_transaction | The DiCE transaction enables access of the the distributed data store and launching of additional jobs, e.g., to retrieve compute results from different nodes. |
[in] | data_distribution | The IData_distribution interface enables to derive a data locality and schedule new distributed jobs. |
[in] | data_subset_processing | The IData_subset_compute_task_processing interface facilitates the execution of user-implemented data processing tasks (see IData_subset_processing_task ). |
[in] | data_subset_index | The data subset ID that corresponds and uniquely identifies the data subset that this execution call processes. |
[in] | data_subset_count | The total number of data subsets that are processed by the distributed data job. |
|
pure virtual |
A data locality query mode specifies which data subsets to process.
Implementations of the class IDistributed_data_locality_query_mode
specify which distributed data scene element to process and typically also provide additional query parameter such as the region of interest in 3D in case of a volume or a 2D in case of a height field. The query mode lets the NVIDIA IndeX system determine the cluster nodes and GPUs on which subset data is located and schedule the distributed data job respectively.
IDistributed_data_locality_query_mode
interface. If the method returns an invalid query mode or just a null pointer then the NVIDIA IndeX system prevents the distributed job execution.
|
pure virtual |
Receiving a distributed data locality from the NVIDIA IndeX system.
Typically, the IDistributed_data_locality
reports on which cluster node the distributed portions of a dataset are hosted. These portions are represented as subset data and have a well-defined bounding box.
The IDistributed_data_locality
may be used by the application to gather details on the job execution as each execution thread of the job is sent to exactly one data subset. That is, the cluster host and the bounding box could be derived and taken into account for further job execution.
[in] | determined_locality | The data locality determined by the NVIDIA IndeX system based on the information that this job implementation provides along with the get_scheduling_mode. |
|
pure virtual |
Receives the result of a remote distributed data job execution.
The method is invoked by the NVIDIA IndeX system on the local node again and assembles the results computed by the correspondent execute_subset_remote that was launched for the data subset with the same data_subset_index
.
[in] | deserializer | The deserializer receives the results computed on a remote node for the given data subset in the correspondent execute_subset_remote call. |
[in] | dice_transaction | The DiCE transaction enables access of the the distributed data store and launching of additional jobs, e.g., to retrieve compute results from different nodes. |
[in] | data_distribution | The IData_distribution interface enables to derive a data locality and schedule new distributed jobs. |
[in] | data_subset_index | The data subset ID that corresponds and uniquely identifies the data subset that this execution call processes. |
[in] | data_subset_count | The total number of data subsets that are processed by the distributed data job. |