Material Definition Language API nvidia_logo_transpbg.gif Up
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups Pages
Example for Execution of Compiled MDL Materials (PTX)
[Previous] [Up] [Next]

This example describes the API of the code generated by the "PTX" backend for compiled materials and shows how a renderer can call this generated code to evaluate sub-expressions of multiple materials using CUDA.

New Topics

  • MDL material state (PTX)
  • Execution of generated code (PTX)
  • Loading textures (PTX)
  • Texture access functions (PTX)

Detailed Description

MDL material state (PTX)


The MDL material state structure mi::neuraylib::Shading_state_material is a representation of the renderer state as defined in section 19 "Renderer state" in the MDL specification. It is used to make the state of the renderer (like the position of an intersection point on the surface, the shading normal and the texture coordinates) available to the generated code.

Here's a version of the material state structure making use of the types from CUDA's vector_types.h:

struct Shading_state_material {
float3 normal; // state::normal() result
float3 geom_normal; // state::geom_normal() result
float3 position; // state::position() result
float animation_time; // state::animation_time() result
const float3 *text_coords; // state::texture_coordinate() table
const float3 *tangent_u; // state::texture_tangent_u() table
const float3 *tangent_v; // state::texture_tangent_v() table
float4 *text_results; // texture results lookup table
const char *ro_data_segment; // read-only data segment
const float4 *world_to_object; // world-to-object transform matrix
const float4 *object_to_world; // object-to-world transform matrix
int object_id; // state::object_id() result
};

Please refer to the structure documentation for more information.

In this example, we fill the material state structure with some example values and only use one texture space. For the world-to-object and object-to-world transformation matrices we use identity matrices. We will iterate the position and text_coords fields over a 2x2 quad around the center of the world with position x and y coordinates ranging from -1 to 1 and the texture uv-coordinates ranging from 0 to 1, respectively.

Execution of generated code (PTX)


For the non-native backends, the generated code has to be called directly from the corresponding framework, so we need to know the prototypes of the functions generated via mi::neuraylib::IMdl_backend::translate_material_expression(). With "NAME" being the function name you provided as fname parameter and "T" being the result type, they look like this:

void NAME(
T *result,
Shading_state_material const *state,
Resource_data const *res_data,
void const *exception_state,
char const *captured_args);

or written as a PTX prototype:

.visible .func NAME(
.param .b64 result,
.param .b64 state,
.param .b64 res_data,
.param .b64 exception_state,
.param .b64 captured_args
);

The res_data parameter is used to provide access to resources like textures depending on the way how those resources are accessed (see "Texture access functions" below). If it is not used, the pointers inside the structure may be NULL:

struct Resource_data {
const void *shared_data;
const Texture_handler_base *texture_handler;
};

The exception_state parameter allows to provide handlers for out-of-bounds array access exceptions and division-by-zero exceptions. But for the PTX backend, this is not supported and the parameter should be set to NULL.

The captured_args parameter is used to provide the data of the mi::neuraylib::ITarget_argument_block object for class-compiled materials. The data can either be manually created using the information from mi::neuraylib::ITarget_value_layout or by using mi::neuraylib::ITarget_code::get_argument_block() or mi::neuraylib::ITarget_code::create_argument_block(). For instance-compiled materials, this parameter should be set to NULL. See Instance-compilation and class-compilation for more details about instance and class compilation.

To make the generated functions available to our CUDA kernel, we have to link them with the kernel. We could just declare the generated functions with the corresponding names as extern "C" in the CUDA source code of the kernel, but we may want to decide at runtime how many materials will be available. So we will add an indirection through an array of pointers to the generated functions which we provide as an additional PTX source code buffer to the CUDA linker.

Note
We currently have to add a dummy function to the PTX code containing this function pointer array, because the CUDA linker will otherwise just resolve the function addresses to zero.
Also, we need to compile the CUDA kernel with the option -rdc=true (relocatable device code), otherwise the extern declared function array will be treated as a definition resulting in two arrays.

In this example, we bake multiple materials into a texture with a user-configurable checkerboard pattern by executing a material for every texel updating the material state accordingly. At the end, we write the texture to disk.

Loading textures (PTX)


When the nv_freeimage plugin has been loaded via mi::neuraylib::IPlugin_configuration::load_plugin_library() before starting the MDL SDK, the SDK will automatically load textures on the host side for many common image formats and make them available via mi::neuraylib::ITarget_code::get_texture(). Note, that the first texture is always the invalid texture, so only if there is more than just one texture according to mi::neuraylib::ITarget_code::get_texture_count(), there will be real referenced textures available.

Here's a small code snippet showing how to access the mi::neuraylib::ICanvas of the texture at index i.

The textures still have to be copied to the GPU and possibly they have to be gamma corrected and converted to a format understood by the texture access functions you provide. In this example, we use the mi::neuraylib::IImage_api to apply the gamma correction and to convert the image format to a float32 RGBA format.

Depending on the texture shape returned by mi::neuraylib::ITarget_code::get_texture_shape() the texture image data has to be copied to a CUDA array (2D textures), a 3D array (3D textures) or a 3D array with the cudaArrayCubemap flag set (cube textures). The textures could then be made available via CUDA texture objects.

Note
For cube textures, you should use the cudaAddressModeClamp address mode for the texture objects to avoid visual artifacts in the corners.
Texture access functions (PTX)


For non-native backends, the generated code requires a set of methods implementing texture access functionality:

  • tex_lookup_float4_2d
  • tex_lookup_float3_2d
  • tex_texel_float4_2d
  • tex_lookup_float4_3d
  • tex_lookup_float3_3d
  • tex_texel_float4_3d
  • tex_lookup_float4_cube
  • tex_lookup_float3_cube
  • tex_resolution_2d

Except for the last one, these correspond directly to the functions described in section 20.3 "Standard library functions - Texture" in the MDL specification.

The tex_lookup_* functions receive floating-point texture coordinates and should provide a sampled value, whereas tex_texel_* functions receive integer texture coordinates and should provide a raw texture value. There are variants for texture lookups with and without alpha channel (float4 / float3) and for the different texture shapes (2d / 3d / cube) as described in section 6.12 "Variables and data types - Textures" in the MDL specification. Note, that PTEX textures are currently not supported by the backends.

tex_resolution_2d retrieves the width and height at the given uv-tile coordinates for a texture_2d. For non-uv-tile textures, the uv-tile coordinates are always (0, 0).

For the PTX backend, there are different ways how the methods can be provided, which can be chosen by setting the "tex_lookup_call_mode" option via the mi::neuraylib::IMdl_backend::set_option() method to the corresponding mode. Please refer to Texture lookup call modes of the PTX backend for more details.

In this example, you can switch from the "direct_call" mode to the slower "vtable" mode by commenting out this line in example_cuda_shared.h:

check_success(be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);

Example Source

To compile the source code, you need to install the "CUDA Toolkit 9" available at https://developer.nvidia.com/cuda-downloads.

For Windows, you should install the toolkit with enabled Visual Studio integration to be able to use the provided project files.

For Linux and Mac OS X, you have to provide the path to the CUDA Toolkit installation via a "CUDA_PATH" environment variable or by setting the path in the Makefile.

Source Code Location: examples/mdl_sdk/execution_cuda/example_execution_cuda.cpp

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/execution_cuda/example_execution_cuda.cpp
//
// Introduces execution of the generated code for compiled material sub-expressions
// for the PTX backend with CUDA.
#include <iostream>
#include <vector>
// Enable this to dump the generated PTX code to stdout.
// #define DUMP_PTX
#include "example_cuda_shared.h"
// Command line options structure.
struct Options {
// The CUDA device ID.
int cuda_device;
// An result output file name.
std::string outputfile;
// The pattern number representing the combination of materials to display.
unsigned material_pattern;
// The resolution of the display / image.
unsigned res_x, res_y;
// Whether class compilation should be used for the materials.
bool use_class_compilation;
// Disables pixel oversampling.
bool no_aa;
// Whether derivative support should be enabled.
bool enable_derivatives;
// Whether terninary operators on *df types are executed at runtime or folded at compile time.
bool fold_ternary_on_df;
// List of materials to use.
std::vector<std::string> material_names;
// The constructor.
Options()
: cuda_device(0)
, outputfile()
, material_pattern(0)
, res_x(700)
, res_y(520)
, use_class_compilation(false)
, no_aa(false)
, enable_derivatives(false)
, fold_ternary_on_df(false)
{
}
};
// Bake the material sub-expressions created with the PTX backend into a canvas with the given
// resolution and the given number of samples for super-sampling.
mi::neuraylib::ICanvas *bake_expression_cuda_ptx(
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > const &target_codes,
std::vector<size_t> const &arg_block_indices,
Options &options,
mi::Uint32 num_samples)
{
// Build the full CUDA kernel with all the generated code
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_execution_cuda_derivatives.ptx" : "example_execution_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(mi::examples::io::get_executable_folder() + "/" + ptx_name).c_str(),
"evaluate_mat_expr",
&cuda_function);
// Prepare the needed data of all target codes for the GPU
Material_gpu_context material_gpu_context(options.enable_derivatives);
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
if (!material_gpu_context.prepare_target_code_data(
transaction, image_api, target_codes[i].get(), arg_block_indices))
return nullptr;
}
CUdeviceptr device_tc_data_list = material_gpu_context.get_device_target_code_data_list();
CUdeviceptr device_arg_block_list =
material_gpu_context.get_device_target_argument_block_list();
// Allocate GPU output buffer
CUdeviceptr device_outbuf;
check_cuda_success(cuMemAlloc(&device_outbuf, options.res_x * options.res_y * sizeof(float3)));
// Launch kernel for the whole image
dim3 threads_per_block(16, 16);
dim3 num_blocks((options.res_x + 15) / 16, (options.res_y + 15) / 16);
void *kernel_params[] = {
&device_outbuf,
&device_tc_data_list,
&device_arg_block_list,
&options.res_x,
&options.res_y,
&num_samples
};
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, kernel_params, nullptr));
// Create a canvas (with only one tile) and copy the result image to it
image_api->create_canvas("Rgb_fp", options.res_x, options.res_y));
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(
data, device_outbuf, options.res_x * options.res_y * sizeof(float3)));
// Cleanup resources not handled by Material_gpu_context
check_cuda_success(cuMemFree(device_outbuf));
check_cuda_success(cuModuleUnload(cuda_module));
canvas->retain();
return canvas.get();
}
void usage(char const *prog_name)
{
std::cout
<< "Usage: " << prog_name << " [options] [(<material_pattern | (<material_name1> ...)]\n"
<< "Options:\n"
<< " --device <id> run on CUDA device <id> (default: 0)\n"
<< " --res <x> <y> resolution (default: 700x520)\n"
<< " --cc use class compilation\n"
<< " --noaa disable pixel oversampling\n"
<< " -d enable use of derivatives\n"
<< " -o <outputfile> image file to write result to\n"
<< " (default: example_cuda_<material_pattern>.png)\n"
<< " --mdl_path <path> mdl search path, can occur multiple times.\n"
<< " --fold_ternary_on_df fold all ternary operators on *df types\n"
<< " <material_pattern> a number from 1 to 2 ^ num_materials - 1 choosing which\n"
<< " material combination to use (default: 2 ^ num_materials - 1)\n"
<< " <material_name*> qualified name of materials to use. The example will try to\n"
<< " access the path \"surface.scattering.tint\"."
<< std::endl;
exit_failure();
}
//------------------------------------------------------------------------------
//
// Main function
//
//------------------------------------------------------------------------------
int MAIN_UTF8(int argc, char* argv[])
{
// Parse command line options
Options options;
mi::examples::mdl::Configure_options configure_options;
for (int i = 1; i < argc; ++i) {
char const *opt = argv[i];
if (opt[0] == '-') {
if (strcmp(opt, "-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
} else if (strcmp(opt, "--device") == 0 && i < argc - 2) {
options.cuda_device = atoi(argv[++i]);
} else if (strcmp(opt, "--res") == 0 && i < argc - 2) {
options.res_x = std::max(atoi(argv[++i]), 1);
options.res_y = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--cc") == 0) {
options.use_class_compilation = true;
} else if (strcmp(opt, "--noaa") == 0) {
options.no_aa = true;
} else if (strcmp(opt, "-d") == 0) {
options.enable_derivatives = true;
} else if (strcmp(opt, "--mdl_path") == 0 && i < argc - 1) {
configure_options.additional_mdl_paths.push_back(argv[++i]);
} else if (strcmp(opt, "--fold_ternary_on_df") == 0) {
options.fold_ternary_on_df = true;
} else {
std::cout << "Unknown option: \"" << opt << "\"" << std::endl;
usage(argv[0]);
}
} else if (opt[0] >= '0' && opt[0] <= '9') {
options.material_pattern = unsigned(atoi(opt));
} else
options.material_names.push_back(std::string(opt));
}
// Use default materials, if none was provided via command line
if (options.material_names.empty()) {
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution1");
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution2");
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution3");
}
if (options.material_pattern == 0)
options.material_pattern = (1 << options.material_names.size()) - 1;
else if (options.material_pattern < 1 ||
options.material_pattern > unsigned(1 << options.material_names.size()) - 1) {
std::cerr << "Invalid material_pattern parameter." << std::endl;
usage(argv[0]);
}
if (options.outputfile.empty())
options.outputfile = "example_cuda_" + to_string(options.material_pattern) + ".png";
// Access the MDL SDK
mi::base::Handle<mi::neuraylib::INeuray> neuray(mi::examples::mdl::load_and_get_ineuray());
if (!neuray.is_valid_interface())
exit_failure("Failed to load the SDK.");
// Configure the MDL SDK
if (!mi::examples::mdl::configure(neuray.get(), configure_options))
exit_failure("Failed to initialize the SDK.");
// Start the MDL SDK
mi::Sint32 ret = neuray->start();
if (ret != 0)
exit_failure("Failed to initialize the SDK. Result code: %d", ret);
{
// Create a transaction
mi::base::Handle<mi::neuraylib::IScope> scope(database->get_global_scope());
mi::base::Handle<mi::neuraylib::ITransaction> transaction(scope->create_transaction());
// Access needed API components
{
// Generate code for material sub-expressions of different materials
// according to the requested material pattern
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > target_codes;
Material_compiler mc(
mdl_impexp_api.get(),
mdl_backend_api.get(),
mdl_factory.get(),
transaction.get(),
/*num_texture_results=*/ 0,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
/*use_df_interpreter=*/ false,
#endif
options.enable_derivatives,
options.fold_ternary_on_df,
/*enable_axuiliary_output*/ false,
/*df_handle_mode*/ "none");
for (std::size_t i = 0, n = options.material_names.size(); i < n; ++i) {
if ((options.material_pattern & (1 << i)) != 0) {
// split module and material name
std::string module_name, material_simple_name;
if (!mi::examples::mdl::parse_cmd_argument_material_name(
options.material_names[i], module_name, material_simple_name, true))
continue;
// add the sub expression
mc.add_material_subexpr(
module_name, material_simple_name,
"surface.scattering.tint", ("tint_" + to_string(i)).c_str(),
options.use_class_compilation);
}
}
// Generate target code for link unit
target_codes.push_back(mc.generate_cuda_ptx());
// Acquire image API needed to prepare the textures and to create a canvas for baking
// Bake the material sub-expressions into a canvas
CUcontext cuda_context = init_cuda(options.cuda_device);
bake_expression_cuda_ptx(
transaction.get(),
image_api.get(),
target_codes,
mc.get_argument_block_indices(),
options,
options.no_aa ? 1 : 8));
uninit_cuda(cuda_context);
// Export the canvas to an image on disk
if (canvas)
mdl_impexp_api->export_canvas(options.outputfile.c_str(), canvas.get());
}
transaction->commit();
}
// Shut down the MDL SDK
if (neuray->shutdown() != 0)
exit_failure("Failed to shutdown the SDK.");
// Unload the MDL SDK
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
// Convert command line arguments to UTF8 on Windows
COMMANDLINE_TO_UTF8

Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/shared/texture_support_cuda.h
//
// This file contains the implementations and the vtables of the texture access functions.
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#include <math.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
typedef mi::neuraylib::Texture_handler_base Texture_handler_base;
// Custom structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
explicit Texture()
: filtered_object(0)
, unfiltered_object(0)
, size(make_uint3(0, 0, 0))
, inv_size(make_float3(0.0f, 0.0f, 0.0f))
{}
explicit Texture(
cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object; // uses filter mode cudaFilterModeLinear
cudaTextureObject_t unfiltered_object; // uses filter mode cudaFilterModePoint
uint3 size; // size of the texture, needed for texel access
float3 inv_size; // the inverse values of the size of the texture
};
// Custom structure representing an MDL BSDF measurement.
struct Mbsdf
{
unsigned has_data[2]; // true if there is a measurement for this part
cudaTextureObject_t eval_data[2]; // uses filter mode cudaFilterModeLinear
float max_albedo[2]; // max albedo used to limit the multiplier
float* sample_data[2]; // CDFs for sampling a BSDF measurement
float* albedo_data[2]; // max albedo for each theta (isotropic)
uint2 angular_resolution[2]; // size of the dataset, needed for texel access
float2 inv_angular_resolution[2]; // the inverse values of the size of the dataset
unsigned num_channels[2]; // number of color channels (1 or 3)
};
// Structure representing a Light Profile
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution; // angular resolution of the grid
float2 theta_phi_start; // start of the grid
float2 theta_phi_delta; // angular step size
float2 theta_phi_inv_delta; // inverse step size
float candela_multiplier; // factor to rescale the normalized data
float total_power;
cudaTextureObject_t eval_data; // normalized data sampled on grid
float* cdf_data; // CDFs for sampling a light profile
};
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler : Texture_handler_base {
// additional data for the texture access functions can be provided here
size_t num_textures; // the number of textures used by the material
// (without the invalid texture)
Texture const *textures; // the textures used by the material
// (without the invalid texture)
size_t num_mbsdfs; // the number of mbsdfs used by the material
// (without the invalid mbsdf)
Mbsdf const *mbsdfs; // the mbsdfs used by the material
// (without the invalid mbsdf)
size_t num_lightprofiles; // number of elements in the lightprofiles field
// (without the invalid light profile)
Lightprofile const *lightprofiles; // a device pointer to a list of mbsdfs objects, if used
// (without the invalid light profile)
};
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler_deriv : mi::neuraylib::Texture_handler_deriv_base {
// additional data for the texture access functions can be provided here
size_t num_textures; // the number of textures used by the material
// (without the invalid texture)
Texture const *textures; // the textures used by the material
// (without the invalid texture)
size_t num_mbsdfs; // the number of mbsdfs used by the material
// (without the invalid texture)
Mbsdf const *mbsdfs; // the mbsdfs used by the material
// (without the invalid texture)
size_t num_lightprofiles; // number of elements in the lightprofiles field
// (without the invalid light profile)
Lightprofile const *lightprofiles; // a device pointer to a list of mbsdfs objects, if used
// (without the invalid light profile)
};
#if defined(__CUDACC__)
// Stores a float4 in a float[4] array.
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
// Stores a float in all elements of a float[4] array.
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
// Stores the given float values in a float[4] array.
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
// Stores a float3 in a float[3] array.
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
// Stores a float4 in a float[3] array, ignoring v.w.
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
// Stores a float in all elements of a float[3] array.
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
// Stores the given float values in a float[3] array.
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
// Stores the luminance if a given float[3] in a float.
__device__ inline void store_result1(float* res, float3 const& v)
{
// store luminance
*res = 0.212671 * v.x + 0.715160 * v.y + 0.072169 * v.z;
}
// Stores the luminance if a given float[3] in a float.
__device__ inline void store_result1(float* res, float v0, float v1, float v2)
{
// store luminance
*res = 0.212671 * v0 + 0.715160 * v1 + 0.072169 * v2;
}
// Stores a given float in a float
__device__ inline void store_result1(float* res, float s)
{
*res = s;
}
// ------------------------------------------------------------------------------------------------
// Textures
// ------------------------------------------------------------------------------------------------
// Applies wrapping and cropping to the given coordinate.
// Note: This macro returns if wrap mode is clip and the coordinate is out of range.
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
/* Do nothing, use texture sampler default behavior */ \
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
// Modify texture coordinates to get better texture filtering,
// see http://www.iquilezles.org/www/articles/texture/texture.htm
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
// Implementation of tex::texel_float4() for a texture_2d texture.
// Note: uvtile textures are not supported
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const /*uv_tile*/[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
// Implementation of tex::lookup_float4() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
// Implementation of tex::lookup_float3() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
// Implementation of tex::texel_float4() for a texture_3d texture.
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
// Implementation of tex::lookup_float4() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
// Implementation of tex::lookup_float3() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
// Implementation of resolution_2d function needed by generated code.
// Note: uvtile textures are not supported
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const /*uv_tile*/[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
// Implementation of resolution_3d function needed by generated code.
// Note: 3d textures are not supported
extern "C" __device__ void tex_resolution_3d(
int result[3],
Texture_handler_base const *self_base,
unsigned texture_idx)
{
// invalid texture returns zero
result[0] = 0;
result[1] = 0;
result[2] = 0;
}
// Implementation of texture_isvalid().
extern "C" __device__ bool tex_texture_isvalid(
Texture_handler_base const *self_base,
unsigned texture_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return texture_idx != 0 && texture_idx - 1 < self->num_textures;
}
// ------------------------------------------------------------------------------------------------
// Light Profiles
// ------------------------------------------------------------------------------------------------
// Implementation of light_profile_power() for a light profile.
extern "C" __device__ float df_light_profile_power(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.total_power;
}
// Implementation of light_profile_maximum() for a light profile.
extern "C" __device__ float df_light_profile_maximum(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.candela_multiplier;
}
// Implementation of light_profile_isvalid() for a light profile.
extern "C" __device__ bool df_light_profile_isvalid(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return light_profile_idx != 0 && light_profile_idx - 1 < self->num_lightprofiles;
}
// binary search through CDF
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
// Implementation of df::light_profile_evaluate() for a light profile.
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
// map theta to 0..1 range
const float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x / float(lp.angular_resolution.x - 1);
// converting input phi from -pi..pi to 0..2pi
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
// floorf wraps phi range into 0..2pi
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
// (phi < 0.0f) is no problem, this is handle by the (black) border
// since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
const float v = phi * lp.theta_phi_inv_delta.y / float(lp.angular_resolution.y - 1);
// wrap_mode: border black would be an alternative (but it produces artifacts at low res)
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
// Implementation of df::light_profile_sample() for a light profile.
extern "C" __device__ void df_light_profile_sample(
float result[3], // output: theta, phi, pdf
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const xi[3]) // uniform random values
{
result[0] = -1.0f; // negative theta means no emission
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
uint2 res = lp.angular_resolution;
// sample theta_out
//-------------------------------------------
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data; // CDF theta
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0); // binary search
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta; // rescale for re-usage
// sample phi_out
//-------------------------------------------
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1) // CDF theta block
+ (idx_theta * (res.y - 1)); // selected CDF for phi
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1); // binary search
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi; // rescale for re-usage
// compute theta and phi
//-------------------------------------------
// sample uniformly within the patch (grid cell)
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
// n = \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
// = 1 / (\cos{\theta_0} - \cos{\theta_1})
//
// \xi = n * \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
// => \cos{\theta} = (1 - \xi) \cos{\theta_0} + \xi \cos{\theta_1}
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
// align phi
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI); // wrap
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1]; // to [-pi, pi]
// compute pdf
//-------------------------------------------
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
// Implementation of df::light_profile_pdf() for a light profile.
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
// CDF data
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
// map theta to 0..1 range
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
// converting input phi from -pi..pi to 0..2pi
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
// floorf wraps phi range into 0..2pi
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
// (phi < 0.0f) is no problem, this is handle by the (black) border
// since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
// wrap_mode: border black would be an alternative (but it produces artifacts at low res)
if (idx_theta < 0 || idx_theta > (res.x - 2) || idx_phi < 0 || idx_phi >(res.x - 2))
return 0.0f;
// get probability for theta
//-------------------------------------------
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
// get probability for phi
//-------------------------------------------
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1) // CDF theta block
+ (idx_theta * (res.y - 1)); // selected CDF for phi
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
// compute probability to select a position in the sphere patch
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cos(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cos(start.x + float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
// ------------------------------------------------------------------------------------------------
// BSDF Measurements
// ------------------------------------------------------------------------------------------------
// Implementation of bsdf_measurement_isvalid() for an MBSDF.
extern "C" __device__ bool df_bsdf_measurement_isvalid(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return bsdf_measurement_index != 0 && bsdf_measurement_index - 1 < self->num_mbsdfs;
}
// Implementation of df::bsdf_measurement_resolution() function needed by generated code,
// which retrieves the angular and chromatic resolution of the given MBSDF.
// The returned triple consists of: number of equi-spaced steps of theta_i and theta_o,
// number of equi-spaced steps of phi, and number of color channels (1 or 3).
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
{
// invalid MBSDF returns zero
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[bsdf_measurement_index - 1];
const unsigned part_index = static_cast<unsigned>(part);
// check for the part
if (bm.has_data[part_index] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
// pass out the information
result[0] = bm.angular_resolution[part_index].x;
result[1] = bm.angular_resolution[part_index].y;
result[2] = bm.num_channels[part_index];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
// assuming each phi is between -pi and pi
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= M_ONE_OVER_PI;
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
// 3D volume on the GPU (phi_delta x theta_out x theta_in)
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
// Implementation of df::bsdf_measurement_evaluate() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
{
// invalid MBSDF returns zero
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
const unsigned part_index = static_cast<unsigned>(part);
// check for the parta
if (bm.has_data[part_index] == 0)
{
store_result3(result, 0.0f);
return;
}
// handle channels
if (bm.num_channels[part_index] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
// Implementation of df::bsdf_measurement_sample() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3], // output: theta, phi, pdf
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi_out[2],
float const xi[3], // uniform random values
Mbsdf_part part)
{
result[0] = -1.0f; // negative theta means absorption
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
return; // invalid MBSDFs returns zero
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return; // check for the part
// CDF data
uint2 res = bm.angular_resolution[part_index];
const float* sample_data = bm.sample_data[part_index];
// compute the theta_in index (flipping input and output, BSDFs are symmetric)
unsigned idx_theta_in = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
idx_theta_in = min(idx_theta_in, res.x - 1);
// sample theta_out
//-------------------------------------------
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_in * res.x;
unsigned idx_theta_out = sample_cdf(cdf_theta, res.x, xi0); // binary search
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta; // rescale for re-usage
// sample phi_out
//-------------------------------------------
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) + // CDF theta block
(idx_theta_in * res.x + idx_theta_out) * res.y; // selected CDF phi
// select which half-circle to choose with probability 0.5
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi_out = sample_cdf(cdf_phi, res.y, xi1); // binary search
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi; // rescale for re-usage
// compute theta and phi out
//-------------------------------------------
const float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi_out) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1]; // phi \in [0, 2pi]
// align phi
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1]; // to [-pi, pi]
// compute pdf
//-------------------------------------------
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
// Implementation of df::bsdf_measurement_pdf() for an MBSDF.
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
return 0.0f; // invalid MBSDF returns zero
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
unsigned part_index = static_cast<unsigned>(part);
// check for the part
if (bm.has_data[part_index] == 0)
return 0.0f;
// CDF data and resolution
const float* sample_data = bm.sample_data[part_index];
uint2 res = bm.angular_resolution[part_index];
// compute indices in the CDF data
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out); // phi_delta, theta_out, theta_in
unsigned idx_theta_in = unsigned(theta_phi_in[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_theta_out = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_phi_out = unsigned(uvw.x * float(res.y));
idx_theta_in = min(idx_theta_in, res.x - 1);
idx_theta_out = min(idx_theta_out, res.x - 1);
idx_phi_out = min(idx_phi_out, res.y - 1);
// get probability to select theta_out
const float* cdf_theta = sample_data + idx_theta_in * res.x;
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
}
// get probability to select phi_out
const float* cdf_phi = sample_data +
(res.x * res.x) + // CDF theta block
(idx_theta_in * res.x + idx_theta_out) * res.y; // selected CDF phi
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
}
// compute probability to select a position in the sphere patch
float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2], // output: max (in case of color) albedo
// for the selected direction ([0]) and
// global ([1])
Texture_handler const *self,
unsigned bsdf_measurement_index,
float const theta_phi[2],
Mbsdf_part part)
{
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
const unsigned part_index = static_cast<unsigned>(part);
// check for the part
if (bm.has_data[part_index] == 0)
return;
const uint2 res = bm.angular_resolution[part_index];
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta = min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_index][idx_theta];
result[1] = bm.max_albedo[part_index];
}
// Implementation of df::bsdf_measurement_albedos() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4], // output: [0] albedo refl. for theta_phi
// [1] max albedo refl. global
// [2] albedo trans. for theta_phi
// [3] max albedo trans. global
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
return; // invalid MBSDF returns zero
df_bsdf_measurement_albedo(
&result[0],
self,
bsdf_measurement_index,
theta_phi,
df_bsdf_measurement_albedo(
&result[2],
self,
bsdf_measurement_index,
theta_phi,
}
// ------------------------------------------------------------------------------------------------
// Scene data (dummy functions)
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_DUMMY_SCENEDATA
// Implementation of scene_data_isvalid().
extern "C" __device__ bool scene_data_isvalid(
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id)
{
return false;
}
// Implementation of scene_data_lookup_float4().
extern "C" __device__ void scene_data_lookup_float4(
float result[4],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[4],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
// Implementation of scene_data_lookup_float3().
extern "C" __device__ void scene_data_lookup_float3(
float result[3],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_color().
extern "C" __device__ void scene_data_lookup_color(
float result[3],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_float2().
extern "C" __device__ void scene_data_lookup_float2(
float result[2],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[2],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
}
// Implementation of scene_data_lookup_float().
extern "C" __device__ float scene_data_lookup_float(
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value,
bool uniform_lookup)
{
// just return default value
return default_value;
}
// Implementation of scene_data_lookup_int4().
extern "C" __device__ void scene_data_lookup_int4(
int result[4],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int const default_value[4],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
// Implementation of scene_data_lookup_int3().
extern "C" __device__ void scene_data_lookup_int3(
int result[3],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_int2().
extern "C" __device__ void scene_data_lookup_int2(
int result[2],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int const default_value[2],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
}
// Implementation of scene_data_lookup_int().
extern "C" __device__ int scene_data_lookup_int(
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int default_value,
bool uniform_lookup)
{
// just return default value
return default_value;
}
// Implementation of scene_data_lookup_float4() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float4(
tct_deriv_arr_float_4 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_4 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float3() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float3(
tct_deriv_arr_float_3 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_3 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_color() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_color(
tct_deriv_arr_float_3 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_3 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float2() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float2(
tct_deriv_arr_float_2 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_2 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float(
tct_deriv_float *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_float const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
#endif // TEX_SUPPORT_NO_DUMMY_SCENEDATA
// ------------------------------------------------------------------------------------------------
// Vtables
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_VTABLES
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode.
__device__ mi::neuraylib::Texture_handler_vtable tex_vtable = {
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
};
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode with derivatives.
__device__ mi::neuraylib::Texture_handler_deriv_vtable tex_deriv_vtable = {
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_deriv_float,
scene_data_lookup_deriv_float2,
scene_data_lookup_deriv_float3,
scene_data_lookup_deriv_float4,
scene_data_lookup_deriv_color,
};
#endif // TEX_SUPPORT_NO_VTABLES
#endif // __CUDACC__
#endif // TEXTURE_SUPPORT_CUDA_H

Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// Code shared by CUDA MDL SDK examples
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include <math.h>
#include "example_shared.h"
#include "compiled_material_traverser_base.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
// Structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object; // uses filter mode cudaFilterModeLinear
cudaTextureObject_t unfiltered_object; // uses filter mode cudaFilterModePoint
uint3 size; // size of the texture, needed for texel access
float3 inv_size; // the inverse values of the size of the texture
};
// Structure representing an MDL bsdf measurement.
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2]; // true if there is a measurement for this part
cudaTextureObject_t eval_data[2]; // uses filter mode cudaFilterModeLinear
float max_albedo[2]; // max albedo used to limit the multiplier
float* sample_data[2]; // CDFs for sampling a BSDF measurement
float* albedo_data[2]; // max albedo for each theta (isotropic)
uint2 angular_resolution[2]; // size of the dataset, needed for texel access
float2 inv_angular_resolution[2]; // the inverse values of the size of the dataset
unsigned num_channels[2]; // number of color channels (1 or 3)
};
// Structure representing a Light Profile
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution; // angular resolution of the grid
float2 theta_phi_start; // start of the grid
float2 theta_phi_delta; // angular step size
float2 theta_phi_inv_delta; // inverse step size
float candela_multiplier; // factor to rescale the normalized data
float total_power;
cudaTextureObject_t eval_data; // normalized data sampled on grid
float* cdf_data; // CDFs for sampling a light profile
};
// Structure representing the resources used by the generated code of a target code.
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures; // number of elements in the textures field
CUdeviceptr textures; // a device pointer to a list of Texture objects, if used
size_t num_mbsdfs; // number of elements in the mbsdfs field
CUdeviceptr mbsdfs; // a device pointer to a list of mbsdfs objects, if used
size_t num_lightprofiles; // number of elements in the lightprofiles field
CUdeviceptr lightprofiles; // a device pointer to a list of mbsdfs objects, if used
CUdeviceptr ro_data_segment; // a device pointer to the read-only data segment, if used
};
//------------------------------------------------------------------------------
//
// Helper functions
//
//------------------------------------------------------------------------------
// Return a textual representation of the given value.
template <typename T>
std::string to_string(T val)
{
std::ostringstream stream;
stream << val;
return stream.str();
}
// Collects the handles in a compiled material
class Handle_collector : public Compiled_material_traverser_base
{
public:
// add all handle appearing in the provided material to the collectors handle list.
explicit Handle_collector(
: Compiled_material_traverser_base()
{
traverse(material, transaction);
}
// get the collected handles.
const std::vector<std::string>& get_handles() const { return m_handles; }
private:
// Called when the traversal reaches a new element.
void visit_begin(const mi::neuraylib::ICompiled_material* material,
const Compiled_material_traverser_base::Traversal_element& element,
void* context) override
{
// look for direct calls
if (!element.expression ||
element.expression->get_kind() != mi::neuraylib::IExpression::EK_DIRECT_CALL)
return;
// check if it is a distribution function
auto transaction = static_cast<mi::neuraylib::ITransaction*>(context);
element.expression->get_interface<const mi::neuraylib::IExpression_direct_call
>());
expr_dcall->get_arguments());
expr_dcall->get_definition()));
get_semantic();
if (semantic < mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_FIRST
|| semantic > mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_LAST)
return;
// check if the last argument is a handle
expr_dcall->get_arguments());
mi::Size arg_count = arguments->get_size();
const char* name = arguments->get_name(arg_count - 1);
if (strcmp(name, "handle") != 0)
return;
// get the handle value
arguments->get_expression(arg_count - 1));
if (expr->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
return; // is an error if 'handle' is a reserved parameter name
expr->get_interface<const mi::neuraylib::IExpression_constant>());
const mi::base::Handle<const mi::neuraylib::IValue> value(expr_const->get_value());
if (value->get_kind() != mi::neuraylib::IValue::VK_STRING)
return;
value->get_interface<const mi::neuraylib::IValue_string>());
std::string handle_value = handle->get_value() ? std::string(handle->get_value()) : "";
if (std::find(m_handles.begin(), m_handles.end(), handle_value) == m_handles.end())
m_handles.push_back(handle_value);
}
std::vector<std::string> m_handles;
};
//------------------------------------------------------------------------------
//
// CUDA helper functions
//
//------------------------------------------------------------------------------
// Helper macro. Checks whether the expression is cudaSuccess and if not prints a message and
// resets the device and exits.
#ifdef ENABLE_DEPRECATED_UTILIY_FUNCTIONS
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
#else
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
cudaDeviceReset(); \
exit_failure( "Error in file %s, line %u: \"%s\".\n", __FILE__, __LINE__, #expr); \
} \
} while (false)
#endif
// Initialize CUDA.
CUcontext init_cuda(
int ordinal
#ifdef OPENGL_INTEROP
, const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
// Use first device used by OpenGL context
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
{
// Use given device
check_cuda_success(cuDeviceGet(&cu_device, ordinal));
}
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
// For this example, increase printf CUDA buffer size to support a larger number
// of MDL debug::print() calls per CUDA kernel launch
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
// Uninitialize CUDA.
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
/*compile error*/
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
// No copy possible.
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
T &r = *it;
deleter(r);
}
}
C &operator*() { return m_cont; }
C const &operator*() const { return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
// No copy possible.
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
// Allocate memory on GPU and copy the given data to the allocated memory.
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T>
CUdeviceptr gpu_mem_dup(std::vector<T> const &data)
{
return gpu_mem_dup(&data[0], data.size() * sizeof(T));
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
//------------------------------------------------------------------------------
//
// Material_gpu_context class
//
//------------------------------------------------------------------------------
// Helper class responsible for making textures and read-only data available to the GPU
// by generating and managing a list of Target_code_data objects.
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
// Use first entry as "not-used" block
m_target_argument_block_list->push_back(0);
}
// Prepare the needed data of the given target code.
bool prepare_target_code_data(
mi::neuraylib::ITarget_code const *target_code,
std::vector<size_t> const &arg_block_indices);
// Get a device pointer to the target code data list.
CUdeviceptr get_device_target_code_data_list();
// Get a device pointer to the target argument block list.
CUdeviceptr get_device_target_argument_block_list();
// Get a device pointer to the i'th target argument block.
CUdeviceptr get_device_target_argument_block(size_t i)
{
// First entry is the "not-used" block, so start at index 1.
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
// Get the number of target argument blocks.
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
// Get the argument block of the i'th BSDF.
// If the BSDF has no target argument block, size_t(~0) is returned.
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
// Get a writable copy of the i'th target argument block.
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
// Get the layout of the i'th target argument block.
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void update_device_argument_block(size_t i);
private:
// Copy the image data of a canvas to a CUDA array.
void copy_canvas_to_cuda_array(cudaArray_t device_array, mi::neuraylib::ICanvas const *canvas);
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool prepare_texture(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size texture_index,
std::vector<Texture> &textures);
// Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
// functions on the GPU.
bool prepare_mbsdf(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size mbsdf_index,
std::vector<Mbsdf> &mbsdfs);
// Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
// functions on the GPU.
bool prepare_lightprofile(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size lightprofile_index,
std::vector<Lightprofile> &lightprofiles);
// If true, mipmaps will be generated for all 2D textures.
bool m_enable_derivatives;
// The device pointer of the target code data list.
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
// List of all target code data objects owned by this context.
Resource_container<Target_code_data> m_target_code_data_list;
// The device pointer of the target argument block list.
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
// List of all target argument blocks owned by this context.
Resource_container<CUdeviceptr> m_target_argument_block_list;
// List of all local, writable copies of the target argument blocks.
std::vector<mi::base::Handle<mi::neuraylib::ITarget_argument_block> > m_own_arg_blocks;
// List of argument block indices per material BSDF.
std::vector<size_t> m_bsdf_arg_block_indices;
// List of all target argument block layouts.
std::vector<mi::base::Handle<mi::neuraylib::ITarget_value_layout const> > m_arg_block_layouts;
// List of all Texture objects owned by this context.
Resource_container<Texture> m_all_textures;
// List of all MBSDFs objects owned by this context.
Resource_container<Mbsdf> m_all_mbsdfs;
// List of all Light profiles objects owned by this context.
Resource_container<Lightprofile> m_all_lightprofiles;
// List of all CUDA arrays owned by this context.
Resource_container<cudaArray_t> m_all_texture_arrays;
// List of all CUDA mipmapped arrays owned by this context.
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
// Get a device pointer to the target code data list.
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
// Get a device pointer to the target argument block list.
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
// Copy the image data of a canvas to a CUDA array.
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
mi::neuraylib::ICanvas const *canvas)
{
mi::Float32 const *data = static_cast<mi::Float32 const *>(tile->get_data());
check_cuda_success(cudaMemcpy2DToArray(
device_array, 0, 0, data,
canvas->get_resolution_x() * sizeof(float) * 4,
canvas->get_resolution_x() * sizeof(float) * 4,
canvas->get_resolution_y(),
cudaMemcpyHostToDevice));
}
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool Material_gpu_context::prepare_texture(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size texture_index,
std::vector<Texture> &textures)
{
// Get access to the texture data by the texture database name from the target code.
transaction->access<mi::neuraylib::ITexture>(code_ptx->get_texture(texture_index)));
transaction->access<mi::neuraylib::IImage>(texture->get_image()));
mi::Uint32 tex_width = canvas->get_resolution_x();
mi::Uint32 tex_height = canvas->get_resolution_y();
mi::Uint32 tex_layers = canvas->get_layers_size();
char const *image_type = image->get_type();
if (image->is_uvtile()) {
std::cerr << "The example does not support uvtile textures!" << std::endl;
return false;
}
if (canvas->get_tiles_size_x() != 1 || canvas->get_tiles_size_y() != 1) {
std::cerr << "The example does not support tiled images!" << std::endl;
return false;
}
// For simplicity, the texture access functions are only implemented for float4 and gamma
// is pre-applied here (all images are converted to linear space).
// Convert to linear color space if necessary
if (texture->get_effective_gamma() != 1.0f) {
// Copy/convert to float4 canvas and adjust gamma from "effective gamma" to 1.
image_api->convert(canvas.get(), "Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma());
image_api->adjust_gamma(gamma_canvas.get(), 1.0f);
canvas = gamma_canvas;
} else if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0) {
// Convert to expected format
canvas = image_api->convert(canvas.get(), "Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
// Copy image data to GPU array depending on texture shape
code_ptx->get_texture_shape(texture_index);
// Cubemap and 3D texture objects require 3D CUDA arrays
tex_layers != 6) {
std::cerr << "Invalid number of layers (" << tex_layers
<< "), cubemaps must have 6 layers!" << std::endl;
return false;
}
// Allocate a 3D array on the GPU
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
cudaArrayCubemap : 0));
// Prepare the memcpy parameter structure
cudaMemcpy3DParms copy_params;
memset(&copy_params, 0, sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
// Copy the image data of all layers (the layers are not consecutive in memory)
for (mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
canvas->get_tile(0, 0, layer));
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(&copy_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
// mipmapped textures use CUDA mipmapped arrays
mi::Uint32 num_levels = image->get_levels();
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
// create all mipmap levels and copy them to the CUDA arrays in the mipmapped array
mi::base::Handle<mi::IArray> mipmaps(image_api->create_mipmaps(canvas.get(), 1.0f));
for (mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
mi::base::Handle<mi::IPointer> mipmap_ptr(mipmaps->get_element<mi::IPointer>(level - 1));
level_canvas = mipmap_ptr->get_pointer<mi::neuraylib::ICanvas>();
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
// 2D texture objects use CUDA arrays
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
// For cube maps we need clamped address mode to avoid artifacts in the corners
cudaTextureAddressMode addr_mode =
? cudaAddressModeClamp
: cudaAddressModeWrap;
// Create filtered texture object
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f; // default value in OpenGL
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
// Create unfiltered texture object if necessary (cube textures have no texel functions)
cudaTextureObject_t tex_obj_unfilt = 0;
// Use a black border for access outside of the texture
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
// Store texture infos in result vector
textures.push_back(Texture(
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.back());
return true;
}
namespace
{
bool prepare_mbsdfs_part(mi::neuraylib::Mbsdf_part part, Mbsdf& mbsdf_cuda_representation,
const mi::neuraylib::IBsdf_measurement* bsdf_measurement)
{
switch (part)
{
dataset = bsdf_measurement->get_reflection<mi::neuraylib::Bsdf_isotropic_data>();
break;
dataset = bsdf_measurement->get_transmission<mi::neuraylib::Bsdf_isotropic_data>();
break;
}
// no data, fine
if (!dataset)
return true;
// get dimensions
uint2 res;
res.x = dataset->get_resolution_theta();
res.y = dataset->get_resolution_phi();
unsigned num_channels = dataset->get_type() == mi::neuraylib::BSDF_SCALAR ? 1 : 3;
mbsdf_cuda_representation.Add(part, res, num_channels);
// get data
// {1,3} * (index_theta_in * (res_phi * res_theta) + index_theta_out * res_phi + index_phi)
const mi::Float32* src_data = buffer->get_data();
// ----------------------------------------------------------------------------------------
// prepare importance sampling data:
// - for theta_in we will be able to perform a two stage CDF, first to select theta_out,
// and second to select phi_out
// - maximum component is used to "probability" in case of colored measurements
// CDF of the probability to select a certain theta_out for a given theta_in
const unsigned int cdf_theta_size = res.x * res.x;
// for each of theta_in x theta_out combination, a CDF of the probabilities to select a
// a certain theta_out is stored
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x]; // albedo for sampling reflection and transmission
float* sample_data_theta = sample_data; // begin of the first (theta) CDF
float* sample_data_phi = sample_data + cdf_theta_size; // begin of the second (phi) CDFs
const float s_theta = (float) (M_PI * 0.5) / float(res.x); // step size
const float s_phi = (float) (M_PI) / float(res.y); // step size
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
// BSDFs are symmetric: f(w_in, w_out) = f(w_out, w_in)
// take the average of both measurements
// area of two the surface elements (the ones we are averaging)
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
// offset for both the thetas into the measurement data (select row in the volume)
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
// build CDF for phi
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value = fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+ fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else /* num_channels == 1 */
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
// normalize CDF for phi
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
// build CDF for theta
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
// normalize CDF for theta
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
// copy entire CDF data buffer to GPU
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
// ----------------------------------------------------------------------------------------
// prepare evaluation data:
// - simply store the measured data in a volume texture
// - in case of color data, we store each sample in a vector4 to get texture support
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
// make lookup data symmetric
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
// Copy data to GPU array
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>() // float3 is not supported
: cudaCreateChannelDesc<float>());
// Allocate a 3D array on the GPU (phi_delta x theta_out x theta_in)
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
// prepare and copy
cudaMemcpy3DParms copy_params;
memset(&copy_params, 0, sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data), // base pointer
res.y * lookup_channels * sizeof(float), // row pitch
res.y, // width of slice
res.x); // height of slice
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(&copy_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size mbsdf_index,
std::vector<Mbsdf> &mbsdfs)
{
// Get access to the texture data by the texture database name from the target code.
code_ptx->get_bsdf_measurement(mbsdf_index)));
Mbsdf mbsdf_cuda;
// handle reflection and transmission
if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_REFLECTION, mbsdf_cuda, mbsdf.get()))
return false;
if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_TRANSMISSION, mbsdf_cuda, mbsdf.get()))
return false;
mbsdfs.push_back(mbsdf_cuda);
m_all_mbsdfs->push_back(mbsdfs.back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size lightprofile_index,
std::vector<Lightprofile> &lightprofiles)
{
// Get access to the texture data by the texture database name from the target code.
code_ptx->get_light_profile(lightprofile_index)));
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
// phi-mayor: [res.x x res.y]
const float* data = lprof_nr->get_data();
// --------------------------------------------------------------------------------------------
// compute total power
// compute inverse CDF data for sampling
// sampling will work on cells rather than grid nodes (used for evaluation)
// first (res.x-1) for the cdf for sampling theta
// rest (rex.x-1) * (res.y-1) for the individual cdfs for sampling phi (after theta)
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
// area of the patch (grid cell)
// \mu = int_{theta0}^{theta1} sin{theta} \delta theta
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
// build CDF for phi
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
// the probability to select a patch corresponds to the value times area
// the value of a cell is the average of the corners
// omit the *1/4 as we normalize in the end
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
// normalize CDF for phi
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
// build CDF for theta
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
// normalize CDF for theta
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
// copy entire CDF data buffer to GPU
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
// --------------------------------------------------------------------------------------------
// prepare evaluation data
// - use a 2d texture that allows bilinear interpolation
// Copy data to GPU array
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
// 2D texture objects use CUDA arrays
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpy2DToArray(
device_lightprofile_data, 0, 0, data,
res.x * sizeof(float), res.x * sizeof(float), res.y, cudaMemcpyHostToDevice));
// Create filtered texture object
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
lightprofiles.push_back(lprof);
m_all_lightprofiles->push_back(lightprofiles.back());
return true;
}
// Prepare the needed target code data of the given target code.
bool Material_gpu_context::prepare_target_code_data(
mi::neuraylib::ITarget_code const *target_code,
std::vector<size_t> const &arg_block_indices)
{
// Target code data list may not have been retrieved already
check_success(m_device_target_code_data_list.get() == 0);
// Handle the read-only data segments if necessary.
// They are only created, if the "enable_ro_segment" backend option was set to "on".
CUdeviceptr device_ro_data = 0;
if (target_code->get_ro_data_segment_count() > 0) {
device_ro_data = gpu_mem_dup(
target_code->get_ro_data_segment_data(0),
target_code->get_ro_data_segment_size(0));
}
// Copy textures to GPU if the code has more than just the invalid texture
CUdeviceptr device_textures = 0;
mi::Size num_textures = target_code->get_texture_count();
if (num_textures > 1) {
std::vector<Texture> textures;
// Loop over all textures skipping the first texture,
// which is always the invalid texture
for (mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
// Copy texture list to GPU
device_textures = gpu_mem_dup(textures);
}
// Copy MBSDFs to GPU if the code has more than just the invalid mbsdf
CUdeviceptr device_mbsdfs = 0;
mi::Size num_mbsdfs = target_code->get_bsdf_measurement_count();
if (num_mbsdfs > 1) {
std::vector<Mbsdf> mbsdfs;
// Loop over all mbsdfs skipping the first mbsdf,
// which is always the invalid mbsdf
for (mi::Size i = 1; i < num_mbsdfs; ++i) {
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
// Copy mbsdf list to GPU
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
// Copy light profiles to GPU if the code has more than just the invalid light profile
CUdeviceptr device_lightprofiles = 0;
mi::Size num_lightprofiles = target_code->get_light_profile_count();
if (num_lightprofiles > 1) {
std::vector<Lightprofile> lightprofiles;
// Loop over all profiles skipping the first profile,
// which is always the invalid profile
for (mi::Size i = 1; i < num_lightprofiles; ++i) {
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
// Copy light profile list to GPU
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
for (mi::Size i = 0, num = target_code->get_argument_block_count(); i < num; ++i) {
target_code->get_argument_block(i));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_own_arg_blocks.push_back(mi::base::make_handle(arg_block->clone()));
m_arg_block_layouts.push_back(
}
for (size_t arg_block_index : arg_block_indices) {
m_bsdf_arg_block_indices.push_back(arg_block_index);
}
return true;
}
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
//------------------------------------------------------------------------------
//
// MDL material compilation code
//
//------------------------------------------------------------------------------
class Material_compiler {
public:
// Constructor.
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
const std::string& df_handle_mode);
// Loads an MDL module and returns the module DB.
std::string load_module(const std::string& mdl_module_name);
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
// If class_compilation is true, the material will use class compilation.
bool add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* fname,
bool class_compilation=false);
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
// If class_compilation is true, the material will use class compilation.
bool add_material_df(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* base_fname,
bool class_compilation=false);
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_descriptions. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool add_material(
const std::string& qualified_module_name,
const std::string& material_simple_name,
mi::neuraylib::Target_function_description* function_descriptions,
mi::Size description_count,
bool class_compilation);
// Generates CUDA PTX target code for the current link unit.
typedef std::vector<mi::base::Handle<mi::neuraylib::IMaterial_definition const> >
Material_definition_list;
// Get the list of used material definitions.
// There will be one entry per add_* call.
Material_definition_list const &get_material_defs()
{
return m_material_defs;
}
typedef std::vector<mi::base::Handle<mi::neuraylib::ICompiled_material const> >
Compiled_material_list;
// Get the list of compiled materials.
// There will be one entry per add_* call.
Compiled_material_list const &get_compiled_materials()
{
return m_compiled_materials;
}
std::vector<size_t> const &get_argument_block_indices() const {
return m_arg_block_indexes;
}
const std::vector<std::string>& get_handles() const {
return m_handles;
}
private:
// Creates an instance of the given material.
mi::neuraylib::IMaterial_instance* create_material_instance(
const std::string& qualified_module_name,
const std::string& material_simple_name);
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material* compile_material_instance(
bool class_compilation);
private:
Material_definition_list m_material_defs;
Compiled_material_list m_compiled_materials;
std::vector<size_t> m_arg_block_indexes;
std::vector<std::string> m_handles;
};
// Constructor.
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
const std::string& df_handle_mode)
: m_mdl_impexp_api(mdl_impexp_api, mi::base::DUP_INTERFACE)
, m_be_cuda_ptx(mdl_backend_api->get_backend(mi::neuraylib::IMdl_backend_api::MB_CUDA_PTX))
, m_mdl_factory(mdl_factory, mi::base::DUP_INTERFACE)
, m_transaction(transaction, mi::base::DUP_INTERFACE)
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
// Option "enable_ro_segment": Default is disabled.
// If you have a lot of big arrays, enabling this might speed up compilation.
// check_success(m_be_cuda_ptx->set_option("enable_ro_segment", "on") == 0);
if (enable_derivatives) {
// Option "texture_runtime_with_derivs": Default is disabled.
// We enable it to get coordinates with derivatives for texture lookup functions.
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
// Option "tex_lookup_call_mode": Default mode is vtable mode.
// You can switch to the slower vtable mode by commenting out the next line.
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
// Option "num_texture_results": Default is 0.
// Set the size of a renderer provided array for texture results in the MDL SDK state in number
// of float4 elements processed by the init() function.
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
if (enable_auxiliary) {
// Option "enable_auxiliary": Default is disabled.
// We enable it to create an additional 'auxiliary' function that can be called on each
// distribution function to fill an albedo and normal buffer e.g. for denoising.
check_success(m_be_cuda_ptx->set_option("enable_auxiliary", "on") == 0);
}
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
// Option "enable_df_interpreter": Default is disabled.
// Using the interpreter allows to reuse the same code for multiple materials
// reducing code divergence, if your scene shows many materials at the same time.
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
// Option "df_handle_slot_mode": Default is "none".
// When using light path expressions, individual parts of the distribution functions can be
// selected using "handles". The contribution of each of those parts has to be evaluated during
// rendering. This option controls how many parts are evaluated with each call into the
// generated "evaluate" and "auxiliary" functions and how the data is passed.
// The CUDA backend supports pointers, which means an externally managed buffer of arbitrary
// size is used to transport the contributions of each part.
check_success(m_be_cuda_ptx->set_option("df_handle_slot_mode", df_handle_mode.c_str()) == 0);
// Option "scene_data_names": Default is "".
// Uncomment the line below to enable calling the scene data runtime functions
// for any scene data names or specify a comma-separated list of names for which
// you may provide scene data. The example runtime functions always return the
// default values, which is the same as not supporting any scene data.
// m_be_cuda_ptx->set_option("scene_data_names", "*");
// force experimental to true for now
m_context->set_option("experimental", true);
m_context->set_option("fold_ternary_on_df", fold_ternary_on_df);
// After we set the options, we can create the link unit
m_link_unit = mi::base::make_handle(m_be_cuda_ptx->create_link_unit(transaction, m_context.get()));
}
std::string Material_compiler::load_module(const std::string& mdl_module_name)
{
// load module
m_mdl_impexp_api->load_module(m_transaction.get(), mdl_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get()))
exit_failure("Failed to load module: %s", mdl_module_name.c_str());
// get and return the DB name
m_mdl_factory->get_db_module_name(mdl_module_name.c_str()));
return db_module_name->get_c_str();
}
// Creates an instance of the given material.
mi::neuraylib::IMaterial_instance* Material_compiler::create_material_instance(
const std::string& qualified_module_name,
const std::string& material_simple_name)
{
// Load mdl module.
m_mdl_impexp_api->load_module(
m_transaction.get(), qualified_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get())) {
// module has errors
return nullptr;
}
// get db name
m_mdl_factory->get_db_module_name(qualified_module_name.c_str()));
std::string material_db_name =
std::string(module_db_name->get_c_str()) + "::" + material_simple_name;
// Create a material instance from the material definition
// with the default arguments.
m_transaction->access<mi::neuraylib::IMaterial_definition>(
material_db_name.c_str()));
if (!material_definition) {
// material with given name does not exists
print_message(
(
"Material '" +
material_simple_name +
"' does not exists in '" +
qualified_module_name + "'").c_str());
return nullptr;
}
m_material_defs.push_back(material_definition);
mi::Sint32 result;
material_definition->create_material_instance(0, &result));
check_success(result == 0);
material_instance->retain();
return material_instance.get();
}
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material *Material_compiler::compile_material_instance(
bool class_compilation)
{
mi::Uint32 flags = class_compilation
material_instance->create_compiled_material(flags, m_context.get()));
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
compiled_material->retain();
return compiled_material.get();
}
// Generates CUDA PTX target code for the current link unit.
mi::base::Handle<const mi::neuraylib::ITarget_code> Material_compiler::generate_cuda_ptx()
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
std::cout << "Dumping CUDA PTX code:\n\n"
<< code_cuda_ptx->get_code() << std::endl;
#endif
return code_cuda_ptx;
}
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* fname,
bool class_compilation)
{
mi::neuraylib::Target_function_description desc;
desc.path = path;
desc.base_fname = fname;
add_material(qualified_module_name, material_simple_name, &desc, 1, class_compilation);
return desc.return_code == 0;
}
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_df(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* base_fname,
bool class_compilation)
{
mi::neuraylib::Target_function_description desc;
desc.path = path;
desc.base_fname = base_fname;
add_material(qualified_module_name, material_simple_name, &desc, 1, class_compilation);
return desc.return_code == 0;
}
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_description. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool Material_compiler::add_material(
const std::string& qualified_module_name,
const std::string& material_simple_name,
mi::neuraylib::Target_function_description* function_descriptions,
mi::Size description_count,
bool class_compilation)
{
if (description_count == 0)
return false;
// Load the given module and create a material instance
create_material_instance(qualified_module_name, material_simple_name));
if (!material_instance)
return false;
// Compile the material instance in instance compilation mode
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
// Note: the same argument_block_index is filled into all function descriptions of a
// material, if any function uses it
m_arg_block_indexes.push_back(function_descriptions[0].argument_block_index);
return print_messages(m_context.get());
}
//------------------------------------------------------------------------------
//
// Material execution code
//
//------------------------------------------------------------------------------
// Helper function to create PTX source code for a non-empty 32-bit value array.
void print_array_u32(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
// PTX does not allow empty arrays, so use a dummy entry
str += "1] = { 0 };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
// Helper function to create PTX source code for a non-empty function pointer array.
void print_array_func(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
// PTX does not allow empty arrays, so use a dummy entry
str += "1] = { dummy_func };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
// Generate PTX array containing the references to all generated functions.
std::string generate_func_array_ptx(
const std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > &target_codes)
{
// Create PTX header and mdl_expr_functions_count constant
std::string src =
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
// Workaround needed to let CUDA linker resolve the function pointers in the arrays.
// Also used for "empty" function arrays.
src += ".func dummy_func() { ret; }\n";
std::string tc_offsets;
std::string function_names;
std::string tc_indices;
std::string ab_indices;
unsigned f_count = 0;
// Iterate over all target codes
for (size_t tc_index = 0, num = target_codes.size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
// in case of multiple target codes, we need to address the functions by a pair of
// target_code_index and function_index.
// the elements in the resulting function array can then be index by offset + func_index.
if(!tc_offsets.empty())
tc_offsets += ", ";
tc_offsets += to_string(f_count);
// Collect all names and prototypes of callable functions within the current target code
for (size_t func_index = 0, func_count = target_code->get_callable_function_count();
func_index < func_count; ++func_index)
{
// add to function list
if (!tc_indices.empty())
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
// target code index in case of multiple link units
tc_indices += to_string(tc_index);
// name of the function
function_names += target_code->get_callable_function(func_index);
// Get argument block index and translate to 1 based list index (-> 0 = not-used)
mi::Size ab_index = target_code->get_callable_function_argument_block_index(func_index);
ab_indices += to_string(ab_index == mi::Size(~0) ? 0 : (ab_index + 1));
f_count++;
// Add prototype declaration
src += target_code->get_callable_function_prototype(
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
// infos per target code (link unit)
src += std::string(".visible .const .align 4 .u32 mdl_target_code_count = ")
+ to_string(target_codes.size()) + ";\n";
print_array_u32(
src, std::string("mdl_target_code_offsets"), unsigned(target_codes.size()), tc_offsets);
// infos per function
src += std::string(".visible .const .align 4 .u32 mdl_functions_count = ")
+ to_string(f_count) + ";\n";
print_array_func(src, std::string("mdl_functions"), f_count, function_names);
print_array_u32(src, std::string("mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src, std::string("mdl_target_code_indices"), f_count, tc_indices);
return src;
}
// Build a linked CUDA kernel containing our kernel and all the generated code, making it
// available to the kernel via an added "mdl_expr_functions" array.
CUmodule build_linked_kernel(
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > const &target_codes,
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
// Generate PTX array containing the references to all generated functions.
// The linker will resolve them to addresses.
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
std::cout << "Dumping CUDA PTX code for the \"mdl_expr_functions\" array:\n\n"
<< ptx_func_array_src << std::endl;
#endif
// Link all generated code, our generated PTX array and our kernel together
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
// Setup the linker
// Pass a buffer for info messages
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
// Pass the size of the info buffer
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
// Pass a buffer for error messages
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
// Pass the size of the error buffer
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
// Add all code generated by the MDL PTX backend
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
// Add the "mdl_expr_functions" array PTX module
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.c_str()),
ptx_func_array_src.size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
// Add our kernel
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
// Link everything to a cubin
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
std::cerr << "PTX linker error:\n" << error_log << std::endl;
check_cuda_success(link_result);
}
std::cout << "CUDA link completed." << std::endl;
if (info_log[0])
std::cout << "Linker output:\n" << info_log << std::endl;
// Load the result and get the entrypoint of our kernel
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(&regs, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
std::cout << "Kernel uses " << regs << " registers and " << lmem << " lmem and has a size of "
<< linked_cubin_size << " bytes." << std::endl;
// Cleanup
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif // EXAMPLE_CUDA_SHARED_H

Source Code Location: examples/mdl_sdk/execution_cuda/example_execution_cuda.cu

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/execution_cuda/example_execution_cuda.cu
//
// This file contains the CUDA kernel used to evaluate the material sub-expressions.
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <math.h>
#include "texture_support_cuda.h"
// To reuse this sample code for the MDL SDK and MDL Core the corresponding namespaces are used.
// when this CUDA code is used in the context of an SDK sample.
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
// when this CUDA code is used in the context of an Core sample.
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
#ifdef ENABLE_DERIVATIVES
typedef Material_expr_function_with_derivs Mat_expr_func;
typedef Shading_state_material_with_derivs Mdl_state;
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Material_expr_function Mat_expr_func;
typedef Shading_state_material Mdl_state;
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
// Custom structure representing the resources used by the generated code of a target code object.
struct Target_code_data
{
size_t num_textures; // number of elements in the textures field
Texture *textures; // a list of Texture objects, if used
char const *ro_data_segment; // the read-only data segment, if used
};
// The number of generated MDL sub-expression functions available.
extern __constant__ unsigned int mdl_functions_count;
// The target argument block indices for the generated MDL sub-expression functions.
// Note: the original indices are incremented by one to allow us to use 0 as "not-used".
extern __constant__ unsigned int mdl_arg_block_indices[];
// The function pointers of the generated MDL sub-expression functions.
// In this example it is assumed that only expressions are added to the link unit.
// For a more complex use case, see also example df_cuda.
extern __constant__ Mat_expr_func *mdl_functions[];
// The target code indices for the generated MDL sub-expression functions.
// In contrast to the df_cuda sample, this example simply iterates over all generated expressions.
// Therefore, no target_code_indices and function_indices are passed from the host side.
// Instead, this additional array allows the mapping to target_code_index.
extern __constant__ unsigned int mdl_target_code_indices[];
// Identity matrix.
// The last row is always implied to be (0, 0, 0, 1).
__constant__ const tct_float4 identity[3] = {
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
// Calculate radical inverse with base 2.
__device__ float radinv2(unsigned int bits)
{
bits = (bits << 16) | (bits >> 16);
bits = ((bits & 0x00ff00ff) << 8) | ((bits & 0xff00ff00) >> 8);
bits = ((bits & 0x0f0f0f0f) << 4) | ((bits & 0xf0f0f0f0) >> 4);
bits = ((bits & 0x33333333) << 2) | ((bits & 0xcccccccc) >> 2);
bits = ((bits & 0x55555555) << 1) | ((bits & 0xaaaaaaaa) >> 1);
return float(bits) / float(0x100000000ULL);
}
// CUDA kernel evaluating the MDL sub-expression for one texel.
extern "C" __global__ void evaluate_mat_expr(
float3 *out_buf,
Target_code_data *tc_data_list,
char const **arg_block_list,
unsigned int width,
unsigned int height,
unsigned int num_samples)
{
// Determine x and y coordinates of texel to be evaluated and check
// whether it is out of bounds (due to block padding)
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
return;
// Calculate position and texture coordinates for a 2x2 quad around the center of the world
float step_x = 1.f / width;
float step_y = 1.f / height;
float pos_x = 2.0f * x * step_x - 1.0f; // [-1, 1)
float pos_y = 2.0f * y * step_y - 1.0f; // [-1, 1)
float tex_x = float(x) * step_x; // [0, 1)
float tex_y = float(y) * step_y; // [0, 1)
// Assign materials in a checkerboard pattern
unsigned int material_index =
((unsigned int)(tex_x * 4) ^ (unsigned int)(tex_y * 4)) % mdl_functions_count;
unsigned int tc_idx = mdl_target_code_indices[material_index];
char const *arg_block = arg_block_list[mdl_arg_block_indices[material_index]];
// Setup MDL material state (with only one texture space)
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 texture_coords[1] = {
{ { tex_x, tex_y, 0.0f }, { step_x, 0.0f, 0.0f }, { 0.0f, step_y, 0.0f } } };
#else
tct_float3 texture_coords[1] = { { tex_x, tex_y, 0.0f } };
#endif
tct_float3 texture_tangent_u[1] = { { 1.0f, 0.0f, 0.0f } };
tct_float3 texture_tangent_v[1] = { { 0.0f, 1.0f, 0.0f } };
Mdl_state mdl_state = {
/*normal=*/ { 0.0f, 0.0f, 1.0f },
/*geom_normal=*/ { 0.0f, 0.0f, 1.0f },
#ifdef ENABLE_DERIVATIVES
/*position=*/
{
{ pos_x, pos_y, 0.0f },
{ 2 * step_x, 0.0f, 0.0f },
{ 0.0f, 2 * step_y, 0.0f }
},
#else
/*position=*/ { pos_x, pos_y, 0.0f },
#endif
/*animation_time=*/ 0.0f,
/*texture_coords=*/ texture_coords,
/*tangent_u=*/ texture_tangent_u,
/*tangent_v=*/ texture_tangent_v,
/*text_results=*/ NULL,
/*ro_data_segment=*/ tc_data_list[tc_idx].ro_data_segment,
/*world_to_object=*/ identity,
/*object_to_world=*/ identity,
/*object_id=*/ 0,
/*meters_per_scene_unit=*/ 1.0f
};
Tex_handler tex_handler;
tex_handler.vtable = &TEX_VTABLE; // only required in 'vtable' mode, otherwise NULL
tex_handler.num_textures = tc_data_list[tc_idx].num_textures;
tex_handler.textures = tc_data_list[tc_idx].textures;
Resource_data res_data_pair = {
NULL, reinterpret_cast<Texture_handler_base *>(&tex_handler) };
// Super-sample the current texel with the given number of samples
float3 res = make_float3(0, 0, 0);
for (unsigned int i = 0; i < num_samples; ++i) {
// Calculate the offset for the current sample
float offs_x = float(i) / num_samples * step_x;
float offs_y = radinv2(i) * step_y;
// Update the position and the texture coordinate
#ifdef ENABLE_DERIVATIVES
mdl_state.position.val.x = pos_x + 2 * offs_x;
mdl_state.position.val.y = pos_y + 2 * offs_y;
texture_coords[0].val.x = tex_x + offs_x;
texture_coords[0].val.y = tex_y + offs_y;
#else
mdl_state.position.x = pos_x + 2 * offs_x;
mdl_state.position.y = pos_y + 2 * offs_y;
texture_coords[0].x = tex_x + offs_x;
texture_coords[0].y = tex_y + offs_y;
#endif
// Add result for current sample
float3 cur_res;
mdl_functions[material_index](&cur_res, &mdl_state, &res_data_pair, NULL, arg_block);
res.x += cur_res.x;
res.y += cur_res.y;
res.z += cur_res.z;
}
// Calculate average over all samples and apply gamma correction
res.x = powf(res.x / num_samples, 1.f / 2.2f);
res.y = powf(res.y / num_samples, 1.f / 2.2f);
res.z = powf(res.z / num_samples, 1.f / 2.2f);
// Write result to output buffer
out_buf[y * width + x] = res;
}
[Previous] [Up] [Next]