Data locality information for distributed datasets. More...
#include <idistributed_data_locality.h>
Inherits mi::base::Interface_declare< 0x64624ed0, ... >.
Public Member Functions | |
virtual mi::Uint32 | get_nb_cluster_nodes () const =0 |
Data subsets are distributed to a number of hosts in a cluster. More... | |
virtual mi::Uint32 | get_cluster_node (mi::Uint32 index) const =0 |
A set of data subsets on a specific cluster node with explicit host identifier. More... | |
virtual mi::Size | get_nb_bounding_box (mi::Uint32 cluster_node_id) const =0 |
A cluster node hosts a set of data subsets bound inside their local-space bounding box. More... | |
virtual const mi::math::Bbox_struct< mi::Sint32, 3 > | get_bounding_box (mi::Uint32 cluster_node_id, mi::Uint32 bounding_box_index) const =0 |
Each data subsets is stored on a cluster node are defined inside its local-space bounding box. More... | |
virtual Subregion_identifier | get_subregion (mi::Uint32 cluster_node_id, mi::Uint32 bounding_box_index) const =0 |
Each data subsets is stored on a cluster node are defined inside its local-space bounding box. More... | |
Data locality information for distributed datasets.
In general, the data representation of large-scale datasets is distributed in the cluster environment for efficient and parallel scalable data processing and analysis and data rendering. NVIDIA IndeX's distribution scheme relies on a spatial subdivision of the entire scene space. Using the subdivision scheme, the scene space is partitioned into subregion, i.e., smaller-sized spatial 3D areas. Such subregion directly partition the dataset representation into data subsets. Data subsets are stored on the nodes and GPUs distributed in the cluster environment. Each data subset is contained in its local space bounding box inside a subregion, wich is defined in the global subdivision space.
The data locality provides an application with the means to query where, i.e., on which node in the cluster, a data subset of the entire distributed dataset is stored. A node stores either none or a set of data subsets. Furthermore, the data locality information provides the bounding boxes that correspond each of to the data subsets stored on a given cluster node or GPU. Each of the bounding boxes can be accessed. The respective index then also corresponds to the distributed data subset.
A common use case that requires the data locality is the invocation of parallel and distributed data job that apply processing and analysis techniques to the distributed data subset in parallel (see IDistributed_data_job
).
IDistributed_data_locality
interface class soon.
|
pure virtual |
Each data subsets is stored on a cluster node are defined inside its local-space bounding box.
Use this method to iterating over all the bounding boxes or data subsets that are stored on the cluster node, e.g., to implement tailor-made job scheduling (see also mi::neuraylib::Fragmented_job
).
[in] | cluster_node_id | The id that references a cluster node. The index must be given in the range from 0 to get_nb_cluster_nodes()-1. |
[in] | bounding_box_index | The index of the bounding box that references the actual subset of the large-scale data representation. The index must be given in the range from 0 to get_nb_bounding_box()-1. |
|
pure virtual |
A set of data subsets on a specific cluster node with explicit host identifier.
Each of the cluster node has a unique identifier. The method allows for interating over all node identiviert that that store at least one data subset. The node identifier can be used to send a job execution to the node. For instance, the mi::neuraylib::Fragmented_job
interface allow for explicit scheduling of job executions to nodes, i.e., node ids.
[in] | index | The index used to access one of the cluster nodes. The index must be given in the range from 0 to get_nb_cluster_nodes()-1. |
|
pure virtual |
A cluster node hosts a set of data subsets bound inside their local-space bounding box.
Each subset has its own bounding box in the scene element's local space. This method returns number of bounding boxes which each representing a single data subset stored on a the cluster node. The number of bounding box, thus, indicates the number of data subsets and allows, for instance, to schedule and appropriate number of executions to the node. That is, knowing the number of data subsets per cluster node enables an application to direct and appropriate number of executions to the nodes to implement a compute algorithms that operate on a data subset granularity.
The method is typically used to iterate over all data subset bounding box (i.e., data subsets) assigned to a a cluster node.
[in] | cluster_node_id | The unique identifier that references a node in the cluster environment. |
|
pure virtual |
Data subsets are distributed to a number of hosts in a cluster.
A cluster node hosts subset or none of the distributed data subsets. The number of nodes that manage the data subsets of the entire distributed dataset allows, for instance, setting up compute jobs and passing the compute executions explicitly the to nodes that store the data subsets locally.
|
pure virtual |
Each data subsets is stored on a cluster node are defined inside its local-space bounding box.
The data subset stored on a cluster node is defined inside its bounding box. The method allows iterating over all the bounding boxes on a cluster machine, e.g., to implement compute techniques.
[in] | cluster_node_id | The id that references a cluster node. The index must be given in the range from 0 to get_nb_cluster_nodes()-1. |
[in] | bounding_box_index | The index of the bounding box that references the actual subset of the large-scale data representation. The index must be given in the range from 0 to get_nb_bounding_box()-1. |