Distributed job interface enabling analysis and processing running against distributed subset data. More...

#include <idata_distribution.h>

Inherits mi::base::Interface_declare< 0x841224fe, ... >.

Public Member Functions
virtual IDistributed_data_locality_query_mode *	get_scheduling_mode () const =0
	A data locality query mode specifies which data subsets to process. More...

virtual void	receive_data_locality (IDistributed_data_locality *determined_locality)=0
	Receiving a distributed data locality from the NVIDIA IndeX system. More...

virtual void	execute_subset (mi::neuraylib::IDice_transaction dice_transaction, const nv::index::IData_distribution data_distribution, nv::index::IData_subset_compute_task_processing *data_subset_processing, mi::Size data_subset_index, mi::Size data_subset_count)=0
	Local execution of the distributed data job for a processing a data subset. More...

virtual void	execute_subset_remote (mi::neuraylib::ISerializer serializer, mi::neuraylib::IDice_transaction dice_transaction, const nv::index::IData_distribution data_distribution, nv::index::IData_subset_compute_task_processing data_subset_processing, mi::Size data_subset_index, mi::Size data_subset_count)=0
	Remote execution of the distributed data job for a processing a data subset. More...

virtual void	receive_subset_result (mi::neuraylib::IDeserializer deserializer, mi::neuraylib::IDice_transaction dice_transaction, const nv::index::IData_distribution *data_distribution, mi::Size data_subset_index, mi::Size data_subset_count)=0
	Receives the result of a remote distributed data job execution. More...

Detailed Description

Distributed job interface enabling analysis and processing running against distributed subset data.

The IDistributed_data_job interface for implementing custom jobs resembles the mi::neuraylib::IFragment_job interface on purpose. While DiCE's mi::neuraylib::IFragment_job interface is meant to implement general cluster wide jobs, e.g., compute algorithms, the IDistributed_data_job interface enables applications to implement algorithms specifically for NVIDIA IndeX's distributed data. Here, the scheduling is based on the data locality, i.e., which cluster node stores which portion of the dataset. The job will then send out one execution thread per subset data so that all data subsets can be processed in parallel either locally or remotely. For local execution, the method execute_subset is invoked to processes a single subset data and for remote execution, the method execute_subset_remote is invoked. The method execute_subset_remote has its local counterpart receive_subset_result in case the distributed data processing analyses the data and derives information from the data that need to be assembled locally. A common use-case for that is the histogram generation.

Technically, the DiCE mi::neuraylib::Fragmented_job infrastructure is used to facilitate the
distributed data analysis and processing. While the processing of subset data could still be implemented using the mi::neuraylib::Fragmented_job interface and, for instance, the information provided by the data locality (see IDistributed_data_locality), the use of the IDistributed_data_job interface reduces the code complexity drastically and provides immediate access to the data subset through IData_subset_processing_task and IData_subset_compute_task_processing interfaces.

Member Function Documentation

^◆ execute_subset()

virtual void nv::index::IDistributed_data_job::execute_subset	(	mi::neuraylib::IDice_transaction *	dice_transaction,
		const nv::index::IData_distribution *	data_distribution,
		nv::index::IData_subset_compute_task_processing *	data_subset_processing,
		mi::Size	data_subset_index,
		mi::Size	data_subset_count
	)

pure virtual

Local execution of the distributed data job for a processing a data subset.

The method is invoked by the NVIDIA IndeX system. It facilitates the processing of a single data subset. For processing the data an user-implemented IData_subset_processing_task can be executed using the method IData_subset_compute_task_processing::execute_compute_task.

Parameters

[in]	dice_transaction	The DiCE transaction enables access of the the distributed data store and launching of additional jobs, e.g., to retrieve compute results from different nodes.
[in]	data_distribution	The `IData_distribution` interface enables to derive a data locality and schedule new distributed jobs.
[in]	data_subset_processing	The `IData_subset_compute_task_processing` interface facilitates the execution of user-implemented data processing tasks (see `IData_subset_processing_task`).
[in]	data_subset_index	The data subset ID that corresponds and uniquely identifies the data subset that this execution call processes.
[in]	data_subset_count	The total number of data subsets that are processed by the distributed data job.

^◆ execute_subset_remote()

virtual void nv::index::IDistributed_data_job::execute_subset_remote	(	mi::neuraylib::ISerializer *	serializer,
		mi::neuraylib::IDice_transaction *	dice_transaction,
		const nv::index::IData_distribution *	data_distribution,
		nv::index::IData_subset_compute_task_processing *	data_subset_processing,
		mi::Size	data_subset_index,
		mi::Size	data_subset_count
	)

pure virtual

Remote execution of the distributed data job for a processing a data subset.

The method is invoked by the NVIDIA IndeX system on a remote node. It facilitates the processing of a single data subset. For processing the data an user-implemented IData_subset_processing_task can be executed using the method IData_subset_compute_task_processing::execute_compute_task.

Parameters

[in]	serializer	The serializer allows the application to communicate computed results, e.g. a mean value, back to the calling job instance. The correspondent receive_subset_result receives the returned result in its deserializer steam.
[in]	dice_transaction	The DiCE transaction enables access of the the distributed data store and launching of additional jobs, e.g., to retrieve compute results from different nodes.
[in]	data_distribution	The `IData_distribution` interface enables to derive a data locality and schedule new distributed jobs.
[in]	data_subset_processing	The `IData_subset_compute_task_processing` interface facilitates the execution of user-implemented data processing tasks (see `IData_subset_processing_task`).
[in]	data_subset_index	The data subset ID that corresponds and uniquely identifies the data subset that this execution call processes.
[in]	data_subset_count	The total number of data subsets that are processed by the distributed data job.

^◆ get_scheduling_mode()

virtual IDistributed_data_locality_query_mode * nv::index::IDistributed_data_job::get_scheduling_mode ( ) const

pure virtual

A data locality query mode specifies which data subsets to process.

Implementations of the class IDistributed_data_locality_query_mode specify which distributed data scene element to process and typically also provide additional query parameter such as the region of interest in 3D in case of a volume or a 2D in case of a height field. The query mode lets the NVIDIA IndeX system determine the cluster nodes and GPUs on which subset data is located and schedule the distributed data job respectively.

Returns: Returns data locality query mode represented by the IDistributed_data_locality_query_mode interface. If the method returns an invalid query mode or just a null pointer then the NVIDIA IndeX system prevents the distributed job execution.

^◆ receive_data_locality()

virtual void nv::index::IDistributed_data_job::receive_data_locality ( IDistributed_data_locality * determined_locality )

pure virtual

Receiving a distributed data locality from the NVIDIA IndeX system.

Typically, the IDistributed_data_locality reports on which cluster node the distributed portions of a dataset are hosted. These portions are represented as subset data and have a well-defined bounding box.

The IDistributed_data_locality may be used by the application to gather details on the job execution as each execution thread of the job is sent to exactly one data subset. That is, the cluster host and the bounding box could be derived and taken into account for further job execution.

Parameters

[in] determined_locality The data locality determined by the NVIDIA IndeX system based on the information that this job implementation provides along with the get_scheduling_mode.

^◆ receive_subset_result()

virtual void nv::index::IDistributed_data_job::receive_subset_result	(	mi::neuraylib::IDeserializer *	deserializer,
		mi::neuraylib::IDice_transaction *	dice_transaction,
		const nv::index::IData_distribution *	data_distribution,
		mi::Size	data_subset_index,
		mi::Size	data_subset_count
	)

pure virtual

Receives the result of a remote distributed data job execution.

The method is invoked by the NVIDIA IndeX system on the local node again and assembles the results computed by the correspondent execute_subset_remote that was launched for the data subset with the same data_subset_index.

Parameters

[in]	deserializer	The deserializer receives the results computed on a remote node for the given data subset in the correspondent execute_subset_remote call.
[in]	dice_transaction	The DiCE transaction enables access of the the distributed data store and launching of additional jobs, e.g., to retrieve compute results from different nodes.
[in]	data_distribution	The `IData_distribution` interface enables to derive a data locality and schedule new distributed jobs.
[in]	data_subset_index	The data subset ID that corresponds and uniquely identifies the data subset that this execution call processes.
[in]	data_subset_count	The total number of data subsets that are processed by the distributed data job.

The documentation for this class was generated from the following file:

idata_distribution.h

Public Member Functions

Detailed Description

Member Function Documentation

◆ execute_subset()

◆ execute_subset_remote()

◆ get_scheduling_mode()

◆ receive_data_locality()

◆ receive_subset_result()

^◆ execute_subset()

^◆ execute_subset_remote()

^◆ get_scheduling_mode()

^◆ receive_data_locality()

^◆ receive_subset_result()