This example describes the API of the code generated by the "PTX" backend for compiled materials and shows how a renderer can call this generated code to evaluate sub-expressions of multiple materials using CUDA.
New Topics
- MDL material state (PTX)
- Execution of generated code (PTX)
- Loading textures (PTX)
- Texture access functions (PTX)
Detailed Description
- MDL material state (PTX)
The MDL material state structure mi::neuraylib::Shading_state_material is a representation of the renderer state as defined in section 19 "Renderer state" in the MDL specification. It is used to make the state of the renderer (like the position of an intersection point on the surface, the shading normal and the texture coordinates) available to the generated code.
Here's a version of the material state structure making use of the types from CUDA's vector_types.h:
float3 normal;
float3 geom_normal;
float3 position;
float animation_time;
const float3 *text_coords;
const float3 *tangent_u;
const float3 *tangent_v;
float4 *text_results;
const char *ro_data_segment;
const float4 *world_to_object;
const float4 *object_to_world;
int object_id;
};
struct Shading_state_material_impl<false> Shading_state_material
The MDL material state structure.
Definition: target_code_types.h:300
Please refer to the structure documentation for more information.
In this example, we fill the material state structure with some example values and only use one texture space. For the world-to-object and object-to-world transformation matrices we use identity matrices. We will iterate the position
and text_coords
fields over a 2x2 quad around the center of the world with position x
and y
coordinates ranging from -1 to 1 and the texture uv-coordinates ranging from 0 to 1, respectively.
- Execution of generated code (PTX)
For the non-native backends, the generated code has to be called directly from the corresponding framework, so we need to know the prototypes of the functions generated via mi::neuraylib::IMdl_backend::translate_material_expression(). With "NAME"
being the function name you provided as fname
parameter and "T"
being the result type, they look like this:
void NAME(
T *result,
Resource_data const *res_data,
void const *exception_state,
char const *captured_args);
or written as a PTX prototype:
.visible .func NAME(
.param .b64 result,
.param .b64 state,
.param .b64 res_data,
.param .b64 exception_state,
.param .b64 captured_args
);
The res_data
parameter is used to provide access to resources like textures depending on the way how those resources are accessed (see "Texture access functions" below). If it is not used, the pointers inside the structure may be NULL:
struct Resource_data {
const void *shared_data;
const Texture_handler_base *texture_handler;
};
The exception_state
parameter allows to provide handlers for out-of-bounds array access exceptions and division-by-zero exceptions. But for the PTX backend, this is not supported and the parameter should be set to NULL.
The captured_args
parameter is used to provide the data of the mi::neuraylib::ITarget_argument_block object for class-compiled materials. The data can either be manually created using the information from mi::neuraylib::ITarget_value_layout or by using mi::neuraylib::ITarget_code::get_argument_block() or mi::neuraylib::ITarget_code::create_argument_block(). For instance-compiled materials, this parameter should be set to NULL. See Instance-compilation and class-compilation for more details about instance and class compilation.
To make the generated functions available to our CUDA kernel, we have to link them with the kernel. We could just declare the generated functions with the corresponding names as extern "C"
in the CUDA source code of the kernel, but we may want to decide at runtime how many materials will be available. So we will add an indirection through an array of pointers to the generated functions which we provide as an additional PTX source code buffer to the CUDA linker.
- Note
- We currently have to add a dummy function to the PTX code containing this function pointer array, because the CUDA linker will otherwise just resolve the function addresses to zero.
-
Also, we need to compile the CUDA kernel with the option
-rdc=true
(relocatable device code), otherwise the extern
declared function array will be treated as a definition resulting in two arrays.
In this example, we bake multiple materials into a texture with a user-configurable checkerboard pattern by executing a material for every texel updating the material state accordingly. At the end, we write the texture to disk.
- Loading textures (PTX)
When the nv_openimageio
plugin has been loaded via mi::neuraylib::IPlugin_configuration::load_plugin_library() before starting the MDL SDK, the SDK will automatically load textures on the host side for many common image formats and make them available via mi::neuraylib::ITarget_code::get_texture(). Note, that the first texture is always the invalid texture, so only if there is more than just one texture according to mi::neuraylib::ITarget_code::get_texture_count(), there will be real referenced textures available.
Here's a small code snippet showing how to access the mi::neuraylib::ICanvas of the texture at index i
.
Handle class template for interfaces, automatizing the lifetime control via reference counting.
Definition: handle.h:113
This interface represents a pixel image file.
Definition: iimage.h:66
Textures add image processing options to images.
Definition: itexture.h:68
The textures still have to be copied to the GPU and possibly they have to be gamma corrected and converted to a format understood by the texture access functions you provide. In this example, we use the mi::neuraylib::IImage_api to apply the gamma correction and to convert the image format to a float32 RGBA format.
Depending on the texture shape returned by mi::neuraylib::ITarget_code::get_texture_shape() the texture image data has to be copied to a CUDA array (2D textures), a 3D array (3D textures) or a 3D array with the cudaArrayCubemap
flag set (cube textures). The textures could then be made available via CUDA texture objects.
- Note
- For cube textures, you should use the
cudaAddressModeClamp
address mode for the texture objects to avoid visual artifacts in the corners.
- Texture access functions (PTX)
For non-native backends, the generated code requires a set of methods implementing texture access functionality:
- tex_lookup_float4_2d
- tex_lookup_float3_2d
- tex_texel_float4_2d
- tex_lookup_float4_3d
- tex_lookup_float3_3d
- tex_texel_float4_3d
- tex_lookup_float4_cube
- tex_lookup_float3_cube
- tex_resolution_2d
Except for the last one, these correspond directly to the functions described in section 20.3 "Standard library functions - Texture" in the MDL specification.
The tex_lookup_*
functions receive floating-point texture coordinates and should provide a sampled value, whereas tex_texel_*
functions receive integer texture coordinates and should provide a raw texture value. There are variants for texture lookups with and without alpha channel (float4 / float3) and for the different texture shapes (2d / 3d / cube) as described in section 6.12 "Variables and data types - Textures" in the MDL specification. Note, that PTEX textures are currently not supported by the backends.
tex_resolution_2d
retrieves the width and height at the given uv-tile coordinates for a texture_2d. For non-uv-tile textures, the uv-tile coordinates are always (0, 0).
For the PTX backend, there are different ways how the methods can be provided, which can be chosen by setting the "tex_lookup_call_mode"
option via the mi::neuraylib::IMdl_backend::set_option() method to the corresponding mode. Please refer to Texture lookup call modes of the PTX backend for more details.
In this example, you can switch from the "direct_call"
mode to the slower "vtable"
mode by commenting out this line in example_cuda_shared.h:
check_success(be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
Example Source
To compile the source code, you need to install the "CUDA Toolkit 9" available at https://developer.nvidia.com/cuda-downloads.
For Windows, you should install the toolkit with enabled Visual Studio integration to be able to use the provided project files.
For Linux and Mac OS X, you have to provide the path to the CUDA Toolkit installation via a "CUDA_PATH"
environment variable or by setting the path in the Makefile
.
Source Code Location: examples/mdl_sdk/execution_cuda/example_execution_cuda.cpp
#include <iostream>
#include <vector>
#include "example_cuda_shared.h"
struct Options {
int cuda_device;
std::string outputfile;
unsigned material_pattern;
unsigned res_x, res_y;
bool use_class_compilation;
bool no_aa;
bool enable_derivatives;
bool fold_ternary_on_df;
std::vector<std::string> material_names;
Options()
: cuda_device(0)
, outputfile()
, material_pattern(0)
, res_x(700)
, res_y(520)
, use_class_compilation(false)
, no_aa(false)
, enable_derivatives(false)
, fold_ternary_on_df(false)
{
}
};
std::vector<size_t> const &arg_block_indices,
Options &options,
{
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_execution_cuda_derivatives.ptx" : "example_execution_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(mi::examples::io::get_executable_folder() + "/" + ptx_name).c_str(),
"evaluate_mat_expr",
&cuda_function);
Material_gpu_context material_gpu_context(options.enable_derivatives);
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
if (!material_gpu_context.prepare_target_code_data(
transaction, image_api, target_codes[i].get(), arg_block_indices))
return nullptr;
}
CUdeviceptr device_tc_data_list = material_gpu_context.get_device_target_code_data_list();
CUdeviceptr device_arg_block_list =
material_gpu_context.get_device_target_argument_block_list();
CUdeviceptr device_outbuf;
check_cuda_success(cuMemAlloc(&device_outbuf, options.res_x * options.res_y * sizeof(float3)));
dim3 threads_per_block(16, 16);
dim3 num_blocks((options.res_x + 15) / 16, (options.res_y + 15) / 16);
void *kernel_params[] = {
&device_outbuf,
&device_tc_data_list,
&device_arg_block_list,
&options.res_x,
&options.res_y,
&num_samples
};
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, kernel_params, nullptr));
image_api->
create_canvas(
"Rgb_fp", options.res_x, options.res_y));
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(
data, device_outbuf, options.res_x * options.res_y * sizeof(float3)));
check_cuda_success(cuMemFree(device_outbuf));
check_cuda_success(cuModuleUnload(cuda_module));
canvas->retain();
return canvas.get();
}
void usage(char const *prog_name)
{
std::cout
<< "Usage: " << prog_name << " [options] [(<material_pattern | (<material_name1> ...)]\n"
<< "Options:\n"
<< " --device <id> run on CUDA device <id> (default: 0)\n"
<< " --res <x> <y> resolution (default: 700x520)\n"
<< " --cc use class compilation\n"
<< " --noaa disable pixel oversampling\n"
<< " -d enable use of derivatives\n"
<< " -o <outputfile> image file to write result to\n"
<< " (default: example_cuda_<material_pattern>.png)\n"
<< " --mdl_path <path> mdl search path, can occur multiple times.\n"
<< " --fold_ternary_on_df fold all ternary operators on *df types\n"
<< " <material_pattern> a number from 1 to 2 ^ num_materials - 1 choosing which\n"
<< " material combination to use (default: 2 ^ num_materials - 1)\n"
<< " <material_name*> qualified name of materials to use. The example will try to\n"
<< " access the path \"surface.scattering.tint\"."
<< std::endl;
exit_failure();
}
int MAIN_UTF8(int argc, char* argv[])
{
Options options;
mi::examples::mdl::Configure_options configure_options;
for (int i = 1; i < argc; ++i) {
char const *opt = argv[i];
if (opt[0] == '-') {
if (strcmp(opt, "-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
} else if (strcmp(opt, "--device") == 0 && i < argc - 2) {
options.cuda_device = atoi(argv[++i]);
} else if (strcmp(opt, "--res") == 0 && i < argc - 2) {
options.res_x = std::max(atoi(argv[++i]), 1);
options.res_y = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--cc") == 0) {
options.use_class_compilation = true;
} else if (strcmp(opt, "--noaa") == 0) {
options.no_aa = true;
} else if (strcmp(opt, "-d") == 0) {
options.enable_derivatives = true;
} else if (strcmp(opt, "--mdl_path") == 0 && i < argc - 1) {
configure_options.additional_mdl_paths.push_back(argv[++i]);
} else if (strcmp(opt, "--fold_ternary_on_df") == 0) {
options.fold_ternary_on_df = true;
} else {
std::cout << "Unknown option: \"" << opt << "\"" << std::endl;
usage(argv[0]);
}
} else if (opt[0] >= '0' && opt[0] <= '9') {
options.material_pattern = unsigned(atoi(opt));
} else
options.material_names.push_back(std::string(opt));
}
if (options.material_names.empty()) {
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution1");
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution2");
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution3");
}
if (options.material_pattern == 0)
options.material_pattern = (1 << options.material_names.size()) - 1;
else if (options.material_pattern < 1 ||
options.material_pattern > unsigned(1 << options.material_names.size()) - 1) {
std::cerr << "Invalid material_pattern parameter." << std::endl;
usage(argv[0]);
}
if (options.outputfile.empty())
options.outputfile = "example_cuda_" + to_string(options.material_pattern) + ".png";
if (!neuray.is_valid_interface())
exit_failure("Failed to load the SDK.");
if (!mi::examples::mdl::configure(neuray.get(), configure_options))
exit_failure("Failed to initialize the SDK.");
if (ret != 0)
exit_failure("Failed to initialize the SDK. Result code: %d", ret);
{
mdl_factory->create_execution_context());
{
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > target_codes;
Material_compiler mc(
mdl_impexp_api.get(),
mdl_backend_api.get(),
mdl_factory.get(),
transaction.get(),
0,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
false,
#endif
options.enable_derivatives,
options.fold_ternary_on_df,
false,
true,
false,
false,
"none",
"sret");
for (std::size_t i = 0, n = options.material_names.size(); i < n; ++i) {
if ((options.material_pattern & (1 << i)) != 0) {
std::string module_name, material_simple_name;
if (!mi::examples::mdl::parse_cmd_argument_material_name(
options.material_names[i], module_name, material_simple_name, true))
continue;
mdl_impexp_api->load_module(transaction.get(), module_name.c_str(), context.get());
if (!print_messages(context.get()))
exit_failure("Loading module '%s' failed.", module_name.c_str());
mdl_factory->get_db_module_name(module_name.c_str()));
if (!module)
exit_failure("Failed to access the loaded module.");
std::string material_db_name
= std::string(module_db_name->get_c_str()) + "::" + material_simple_name;
material_db_name = mi::examples::mdl::add_missing_material_signature(
module.get(), material_db_name);
if (material_db_name.empty())
exit_failure("Failed to find the material %s in the module %s.",
material_simple_name.c_str(), module_name.c_str());
mc.add_material_subexpr(
module_name, material_db_name,
"surface.scattering.tint", ("tint_" + to_string(i)).c_str(),
options.use_class_compilation);
}
}
target_codes.push_back(mc.generate_cuda_ptx());
CUcontext cuda_context = init_cuda(options.cuda_device);
bake_expression_cuda_ptx(
transaction.get(),
image_api.get(),
target_codes,
mc.get_argument_block_indices(),
options,
options.no_aa ? 1 : 8));
uninit_cuda(cuda_context);
if (canvas)
mdl_impexp_api->export_canvas(options.outputfile.c_str(), canvas.get());
}
}
if (neuray->shutdown() != 0)
exit_failure("Failed to shutdown the SDK.");
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
COMMANDLINE_TO_UTF8
Abstract interface for a canvas represented by a rectangular array of tiles.
Definition: icanvas.h:89
This interface is used to interact with the distributed database.
Definition: idatabase.h:289
This interface provides various utilities related to canvases and buffers.
Definition: iimage_api.h:72
virtual ICanvas * create_canvas(const char *pixel_type, Uint32 width, Uint32 height, Uint32 layers=1, bool is_cubemap=false, Float32 gamma=0.0f) const =0
Creates a canvas with given pixel type, resolution, and layers.
This interface can be used to obtain the MDL backends.
Definition: imdl_backend_api.h:56
Factory for various MDL interfaces and functions.
Definition: imdl_factory.h:53
API component for MDL related import and export operations.
Definition: imdl_impexp_api.h:43
This interface represents an MDL module.
Definition: imodule.h:634
A transaction provides a consistent view on the database.
Definition: itransaction.h:82
virtual const base::IInterface * access(const char *name)=0
Retrieves an element from the database.
virtual Sint32 commit()=0
Commits the transaction.
unsigned int Uint32
32-bit unsigned integer.
Definition: types.h:49
signed int Sint32
32-bit signed integer.
Definition: types.h:46
Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
struct Texture
{
explicit Texture()
: filtered_object(0)
, unfiltered_object(0)
, size(make_uint3(0, 0, 0))
, inv_size(make_float3(0.0f, 0.0f, 0.0f))
{}
explicit Texture(
cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, inv_angular_resolution(make_float2(0.0f, 0.0f))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution;
float2 inv_angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Texture_handler : Texture_handler_base {
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
#if defined(__CUDACC__)
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
\
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const [2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
float const crop_u[2],
float const crop_v[2],
float const crop_w[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
float const crop_u[2],
float const crop_v[2],
float const crop_w[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const [2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
extern "C" __device__ void tex_resolution_3d(
int result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float )
{
Texture_handler const* self = static_cast<Texture_handler const*>(self_base);
if (texture_idx == 0 || texture_idx - 1 >= self->num_textures) {
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Texture const& tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
result[2] = tex.size.z;
}
extern "C" __device__ bool tex_texture_isvalid(
Texture_handler_base const *self_base,
unsigned texture_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return texture_idx != 0 && texture_idx - 1 < self->num_textures;
}
extern "C" __device__ void tex_frame(
int result[2],
Texture_handler_base const *,
unsigned )
{
result[0] = 0;
result[1] = 0;
}
extern "C" __device__ float df_light_profile_power(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.total_power;
}
extern "C" __device__ float df_light_profile_maximum(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.candela_multiplier;
}
extern "C" __device__ bool df_light_profile_isvalid(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return light_profile_idx != 0 && light_profile_idx - 1 < self->num_lightprofiles;
}
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x * lp.inv_angular_resolution.x;
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
float v = phi * lp.theta_phi_inv_delta.y * lp.inv_angular_resolution.y;
u += 0.5f * lp.inv_angular_resolution.x;
v += 0.5f * lp.inv_angular_resolution.y;
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
extern "C" __device__ void df_light_profile_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const xi[3])
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
uint2 res = lp.angular_resolution;
if (res.x <= 2 || res.y <= 2)
return;
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data;
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0);
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1)
+ (idx_theta * (res.y - 1));
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1);
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
if (idx_theta < 0 || idx_theta > res.x - 2 || idx_phi < 0 || idx_phi > res.y - 2)
return 0.0f;
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1)
+ (idx_theta * (res.y - 1));
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ bool df_bsdf_measurement_isvalid(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return bsdf_measurement_idx != 0 && bsdf_measurement_idx - 1 < self->num_mbsdfs;
}
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
result[0] = bm.angular_resolution[part_idx].x;
result[1] = bm.angular_resolution[part_idx].y;
result[2] = bm.num_channels[part_idx];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= float(M_ONE_OVER_PI);
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
{
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
{
store_result3(result, 0.0f);
return;
}
if (bm.num_channels[part_idx] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_out[2],
float const xi[3],
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return;
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return;
uint2 res = bm.angular_resolution[part_idx];
const float* sample_data = bm.sample_data[part_idx];
if (res.x < 1 || res.y < 1)
return;
unsigned idx_theta_out = unsigned(theta_phi_out[0] * float(M_ONE_OVER_PI * 2.0f) * float(res.x));
idx_theta_out = min(idx_theta_out, res.x - 1);
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_out * res.x;
unsigned idx_theta_in = sample_cdf(cdf_theta, res.x, xi0);
float prob_theta = cdf_theta[idx_theta_in];
if (idx_theta_in > 0)
{
const float tmp = cdf_theta[idx_theta_in - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_out * res.x + idx_theta_in) * res.y;
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi = sample_cdf(cdf_phi, res.y, xi1);
float prob_phi = cdf_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 inv_res = bm.inv_angular_resolution[part_idx];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_in) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1];
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return 0.0f;
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return 0.0f;
const float* sample_data = bm.sample_data[part_idx];
uint2 res = bm.angular_resolution[part_idx];
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
unsigned idx_theta_in = unsigned(uvw.z * float(res.x));
unsigned idx_theta_out = unsigned(uvw.y * float(res.x));
unsigned idx_phi = unsigned(uvw.x * float(res.y));
idx_theta_in = min(idx_theta_in, res.x - 1);
idx_theta_out = min(idx_theta_out, res.x - 1);
idx_phi = min(idx_phi, res.y - 1);
const float* cdf_theta = sample_data + idx_theta_out * res.x;
float prob_theta = cdf_theta[idx_theta_in];
if (idx_theta_in > 0)
{
const float tmp = cdf_theta[idx_theta_in - 1];
prob_theta -= tmp;
}
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_out * res.x + idx_theta_in) * res.y;
float prob_phi = cdf_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_phi[idx_phi - 1];
prob_phi -= tmp;
}
float2 inv_res = bm.inv_angular_resolution[part_idx];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_in) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2],
Texture_handler const *self,
unsigned bsdf_measurement_idx,
float const theta_phi[2],
{
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return;
const uint2 res = bm.angular_resolution[part_idx];
if (res.x < 1)
return;
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta = min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_idx][idx_theta];
result[1] = bm.max_albedo[part_idx];
}
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return;
df_bsdf_measurement_albedo(
&result[0],
self,
bsdf_measurement_idx,
theta_phi,
df_bsdf_measurement_albedo(
&result[2],
self,
bsdf_measurement_idx,
theta_phi,
}
#ifndef TEX_SUPPORT_NO_DUMMY_ADAPTNORMAL
extern "C" __device__ void adapt_normal(
float result[3],
Texture_handler_base const *self_base,
float const normal[3])
{
result[0] = normal[0];
result[1] = normal[1];
result[2] = normal[2];
}
#endif
#ifndef TEX_SUPPORT_NO_DUMMY_SCENEDATA
extern "C" __device__ bool scene_data_isvalid(
Texture_handler_base const *self_base,
unsigned scene_data_id)
{
return false;
}
extern "C" __device__ void scene_data_lookup_float4(
float result[4],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[4],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
extern "C" __device__ void scene_data_lookup_float3(
float result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
extern "C" __device__ void scene_data_lookup_color(
float result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
extern "C" __device__ void scene_data_lookup_float2(
float result[2],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[2],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
}
extern "C" __device__ float scene_data_lookup_float(
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value,
bool uniform_lookup)
{
return default_value;
}
extern "C" __device__ void scene_data_lookup_int4(
int result[4],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[4],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
extern "C" __device__ void scene_data_lookup_int3(
int result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[3],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
extern "C" __device__ void scene_data_lookup_int2(
int result[2],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[2],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
}
extern "C" __device__ int scene_data_lookup_int(
Texture_handler_base const *self_base,
unsigned scene_data_id,
int default_value,
bool uniform_lookup)
{
return default_value;
}
extern "C" __device__ void scene_data_lookup_float4x4(
float result[16],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[16],
bool uniform_lookup)
{
for (int i = 0; i < 16; ++i)
result[i] = default_value[i];
}
extern "C" __device__ void scene_data_lookup_deriv_float4(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_float3(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_color(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_float2(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_float(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
#endif
#ifndef TEX_SUPPORT_NO_VTABLES
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
tex_frame,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
adapt_normal,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_float4x4,
};
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
tex_frame,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
adapt_normal,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_float4x4,
scene_data_lookup_deriv_float,
scene_data_lookup_deriv_float2,
scene_data_lookup_deriv_float3,
scene_data_lookup_deriv_float4,
scene_data_lookup_deriv_color,
};
#endif
#endif
#endif
tct_traits<true>::tct_derivable_float2 tct_deriv_float2
A float2 with derivatives.
Definition: target_code_types.h:129
Mbsdf_part
MBSDFs can consist of two parts, which can be selected using this enumeration.
Definition: target_code_types.h:328
tct_deriv< float[4]> tct_deriv_arr_float_4
A float[4] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:144
tct_traits<true>::tct_derivable_float tct_deriv_float
A float with derivatives.
Definition: target_code_types.h:126
Tex_wrap_mode
The texture wrap modes as defined by tex::wrap_mode in the MDL specification.
Definition: target_code_types.h:309
struct Shading_state_material_impl<true> Shading_state_material_with_derivs
The MDL material state structure with derivatives for the texture coordinates.
Definition: target_code_types.h:303
tct_deriv< float[2]> tct_deriv_arr_float_2
A float[2] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:138
tct_deriv< float[3]> tct_deriv_arr_float_3
A float[3] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:141
@ MBSDF_DATA_TRANSMISSION
the bidirectional transmission distribution function (BTDF)
Definition: target_code_types.h:333
@ MBSDF_DATA_REFLECTION
the bidirectional reflection distribution function (BRDF)
Definition: target_code_types.h:330
The MDL material state structure inside the MDL SDK is a representation of the renderer state as defi...
Definition: target_code_types.h:210
The texture handler structure that is passed to the texturing functions.
Definition: target_code_types.h:712
The texture handler structure that is passed to the texturing functions with derivative support.
Definition: target_code_types.h:721
The runtime for bitmap texture access for the generated target code can optionally be implemented in ...
Definition: target_code_types.h:344
A template struct with derivatives.
Definition: target_code_types.h:97
Types required for execution of generated native and CUDA code.
Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include "example_shared.h"
#include "compiled_material_traverser_base.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
#include "utils/profiling.h"
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, inv_angular_resolution(make_float2(
1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y)))
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution;
float2 inv_angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures;
CUdeviceptr textures;
size_t num_mbsdfs;
CUdeviceptr mbsdfs;
size_t num_lightprofiles;
CUdeviceptr lightprofiles;
CUdeviceptr ro_data_segment;
};
template <typename T>
std::string to_string(T val)
{
std::ostringstream stream;
stream << val;
return stream.str();
}
class Handle_collector : public Compiled_material_traverser_base
{
public:
explicit Handle_collector(
: Compiled_material_traverser_base()
{
traverse(material, transaction);
}
const std::vector<std::string>& get_handles() const { return m_handles; }
private:
const Compiled_material_traverser_base::Traversal_element& element,
void* context) override
{
if (!element.expression ||
return;
>());
expr_dcall->get_arguments());
expr_dcall->get_definition()));
get_semantic();
if (semantic < mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_FIRST
|| semantic > mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_LAST)
return;
expr_dcall->get_arguments());
mi::Size arg_count = arguments->get_size();
const char* name = arguments->get_name(arg_count - 1);
if (strcmp(name, "handle") != 0)
return;
arguments->get_expression(arg_count - 1));
return;
return;
std::string handle_value = handle->get_value() ? std::string(handle->get_value()) : "";
if (std::find(m_handles.begin(), m_handles.end(), handle_value) == m_handles.end())
m_handles.push_back(handle_value);
}
std::vector<std::string> m_handles;
};
#ifdef ENABLE_DEPRECATED_UTILIY_FUNCTIONS
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
#else
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
cudaDeviceReset(); \
exit_failure( "Error in file %s, line %u: \"%s\".\n", __FILE__, __LINE__, #expr); \
} \
} while (false)
#endif
CUcontext init_cuda(
int ordinal
#ifdef OPENGL_INTEROP
, const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
{
check_cuda_success(cuDeviceGet(&cu_device, ordinal));
}
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
T &r = *it;
deleter(r);
}
}
C
const &
operator*()
const {
return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
template<typename T>
CUdeviceptr gpu_mem_dup(std::vector<T> const &data)
{
return gpu_mem_dup(&data[0], data.size() * sizeof(T));
}
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
m_target_argument_block_list->push_back(0);
}
bool prepare_target_code_data(
std::vector<size_t> const &arg_block_indices);
CUdeviceptr get_device_target_code_data_list();
CUdeviceptr get_device_target_argument_block_list();
CUdeviceptr get_device_target_argument_block(size_t i)
{
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
void update_device_argument_block(size_t i);
private:
bool prepare_texture(
std::vector<Texture> &textures);
bool prepare_mbsdf(
std::vector<Mbsdf> &mbsdfs);
bool prepare_lightprofile(
std::vector<Lightprofile> &lightprofiles);
bool m_enable_derivatives;
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
Resource_container<Target_code_data> m_target_code_data_list;
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
Resource_container<CUdeviceptr> m_target_argument_block_list;
std::vector<mi::base::Handle<mi::neuraylib::ITarget_argument_block> > m_own_arg_blocks;
std::vector<size_t> m_bsdf_arg_block_indices;
std::vector<mi::base::Handle<mi::neuraylib::ITarget_value_layout const> > m_arg_block_layouts;
Resource_container<Texture> m_all_textures;
Resource_container<Mbsdf> m_all_mbsdfs;
Resource_container<Lightprofile> m_all_lightprofiles;
Resource_container<cudaArray_t> m_all_texture_arrays;
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
{
check_cuda_success(cudaMemcpy2DToArray(
device_array, 0, 0, data,
cudaMemcpyHostToDevice));
}
bool Material_gpu_context::prepare_texture(
std::vector<Texture> &textures)
{
char const *image_type = image->get_type(0, 0);
if (image->is_uvtile() || image->is_animated()) {
std::cerr << "The example does not support uvtile and/or animated textures!" << std::endl;
return false;
}
if (texture->get_effective_gamma(0, 0) != 1.0f) {
image_api->
convert(canvas.get(),
"Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma(0, 0));
canvas = gamma_canvas;
} else if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0) {
canvas = image_api->
convert(canvas.get(),
"Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
tex_layers != 6) {
std::cerr << "Invalid number of layers (" << tex_layers
<< "), cubemaps must have 6 layers!" << std::endl;
return false;
}
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
cudaArrayCubemap : 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0, sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
for (
mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(©_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
for (
mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.
get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
cudaTextureAddressMode addr_mode =
? cudaAddressModeClamp
: cudaAddressModeWrap;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f;
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
cudaTextureObject_t tex_obj_unfilt = 0;
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
textures.push_back(Texture(
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.back());
return true;
}
namespace
{
{
switch (part)
{
break;
break;
}
if (!dataset)
return true;
uint2 res;
res.x = dataset->get_resolution_theta();
res.y = dataset->get_resolution_phi();
mbsdf_cuda_representation.Add(part, res, num_channels);
const unsigned int cdf_theta_size = res.x * res.x;
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x];
float* sample_data_theta = sample_data;
float* sample_data_phi = sample_data + cdf_theta_size;
const float s_theta = (float) (M_PI * 0.5) / float(res.x);
const float s_phi = (float) (M_PI) / float(res.y);
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value = fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+ fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>()
: cudaCreateChannelDesc<float>());
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0, sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data),
res.y * lookup_channels * sizeof(float),
res.y,
res.x);
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(©_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
std::vector<Mbsdf> &mbsdfs)
{
Mbsdf mbsdf_cuda;
return false;
return false;
mbsdfs.push_back(mbsdf_cuda);
m_all_mbsdfs->push_back(mbsdfs.back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
std::vector<Lightprofile> &lightprofiles)
{
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
const float* data = lprof_nr->get_data();
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpy2DToArray(
device_lightprofile_data, 0, 0, data,
res.x * sizeof(float), res.x * sizeof(float), res.y, cudaMemcpyHostToDevice));
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
lightprofiles.push_back(lprof);
m_all_lightprofiles->push_back(lightprofiles.back());
return true;
}
bool Material_gpu_context::prepare_target_code_data(
std::vector<size_t> const &arg_block_indices)
{
check_success(m_device_target_code_data_list.get() == 0);
CUdeviceptr device_ro_data = 0;
device_ro_data = gpu_mem_dup(
}
CUdeviceptr device_textures = 0;
if (num_textures > 1) {
std::vector<Texture> textures;
for (
mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
device_textures = gpu_mem_dup(textures);
}
CUdeviceptr device_mbsdfs = 0;
if (num_mbsdfs > 1) {
std::vector<Mbsdf> mbsdfs;
for (
mi::Size i = 1; i < num_mbsdfs; ++i) {
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
CUdeviceptr device_lightprofiles = 0;
if (num_lightprofiles > 1) {
std::vector<Lightprofile> lightprofiles;
for (
mi::Size i = 1; i < num_lightprofiles; ++i) {
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_arg_block_layouts.push_back(
}
for (size_t arg_block_index : arg_block_indices) {
m_bsdf_arg_block_indices.push_back(arg_block_index);
}
return true;
}
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
class Material_compiler {
public:
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
bool enable_pdf,
bool use_adapt_normal,
bool enable_bsdf_flags,
const std::string& df_handle_mode,
const std::string& lambda_return_mode);
std::string load_module(const std::string& mdl_module_name);
bool add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* fname,
bool class_compilation=false);
bool add_material_df(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* base_fname,
bool class_compilation=false);
bool add_material(
const std::string& qualified_module_name,
const std::string& material_db_name,
bool class_compilation);
typedef std::vector<mi::base::Handle<mi::neuraylib::IFunction_definition const> >
Material_definition_list;
Material_definition_list const &get_material_defs()
{
return m_material_defs;
}
typedef std::vector<mi::base::Handle<mi::neuraylib::ICompiled_material const> >
Compiled_material_list;
Compiled_material_list const &get_compiled_materials()
{
return m_compiled_materials;
}
std::vector<size_t> const &get_argument_block_indices() const {
return m_arg_block_indexes;
}
const std::vector<std::string>& get_handles() const {
return m_handles;
}
private:
const std::string& qualified_module_name,
const std::string& material_db_name);
bool class_compilation);
private:
Material_definition_list m_material_defs;
Compiled_material_list m_compiled_materials;
std::vector<size_t> m_arg_block_indexes;
std::vector<std::string> m_handles;
};
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
bool enable_pdf,
bool use_adapt_normal,
bool enable_bsdf_flags,
const std::string& df_handle_mode,
const std::string& lambda_return_mode)
, m_be_cuda_ptx(mdl_backend_api->get_backend(
mi::neuraylib::IMdl_backend_api:
:MB_CUDA_PTX))
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
if (enable_derivatives) {
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
if (enable_auxiliary) {
check_success(m_be_cuda_ptx->set_option("enable_auxiliary", "on") == 0);
}
if (!enable_pdf) {
check_success(m_be_cuda_ptx->set_option("enable_pdf", "off") == 0);
}
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
check_success(m_be_cuda_ptx->set_option("df_handle_slot_mode", df_handle_mode.c_str()) == 0);
check_success(m_be_cuda_ptx->set_option("lambda_return_mode", lambda_return_mode.c_str()) == 0);
if (use_adapt_normal) {
check_success(m_be_cuda_ptx->set_option("use_renderer_adapt_normal", "on") == 0);
}
if (enable_bsdf_flags) {
check_success(m_be_cuda_ptx->set_option("libbsdf_flags_in_bsdf_data", "on") == 0);
}
m_context->set_option("experimental", true);
m_context->set_option("fold_ternary_on_df", fold_ternary_on_df);
}
std::string Material_compiler::load_module(const std::string& mdl_module_name)
{
m_mdl_impexp_api->load_module(m_transaction.get(), mdl_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get()))
exit_failure("Failed to load module: %s", mdl_module_name.c_str());
m_mdl_factory->get_db_module_name(mdl_module_name.c_str()));
return db_module_name->get_c_str();
}
const std::string& qualified_module_name,
const std::string& material_db_name)
{
material_db_name.c_str()));
if (!material_definition) {
print_message(
(
"Material '" +
material_db_name +
"' does not exist in '" +
qualified_module_name + "'").c_str());
return nullptr;
}
m_material_defs.push_back(material_definition);
material_definition->create_function_call(0, &result));
check_success(result == 0);
material_instance->retain();
return material_instance.get();
}
bool class_compilation)
{
m_mdl_factory->create_type_factory(m_transaction.get()));
m_context->set_option("target_type", standard_material_type.get());
material_instance2->create_compiled_material(flags, m_context.get()));
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
compiled_material->retain();
return compiled_material.get();
}
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
FILE *file = fopen("target_code.ptx", "wt");
if (file)
{
fwrite(code_cuda_ptx->get_code(), code_cuda_ptx->get_code_size(), 1, file);
fclose(file);
}
#endif
return code_cuda_ptx;
}
bool Material_compiler::add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* fname,
bool class_compilation)
{
add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
}
bool Material_compiler::add_material_df(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* base_fname,
bool class_compilation)
{
add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
}
bool Material_compiler::add_material(
const std::string& qualified_module_name,
const std::string& material_db_name,
bool class_compilation)
{
if (description_count == 0)
return false;
create_material_instance(qualified_module_name, material_db_name));
if (!material_instance)
return false;
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
m_arg_block_indexes.push_back(function_descriptions[0].argument_block_index);
return print_messages(m_context.get());
}
void print_array_u32(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
str += "1] = { 0 };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
void print_array_func(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
str += "1] = { dummy_func };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
std::string generate_func_array_ptx(
{
std::string src =
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
src += ".func dummy_func() { ret; }\n";
std::string tc_offsets;
std::string function_names;
std::string tc_indices;
std::string ab_indices;
unsigned f_count = 0;
for (size_t tc_index = 0, num = target_codes.size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
if(!tc_offsets.empty())
tc_offsets += ", ";
tc_offsets += to_string(f_count);
for (size_t func_index = 0, func_count = target_code->get_callable_function_count();
func_index < func_count; ++func_index)
{
if (!tc_indices.empty())
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
tc_indices += to_string(tc_index);
function_names += target_code->get_callable_function(func_index);
mi::Size ab_index = target_code->get_callable_function_argument_block_index(func_index);
ab_indices += to_string(ab_index ==
mi::Size(~0) ? 0 : (ab_index + 1));
f_count++;
src += target_code->get_callable_function_prototype(
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
src += std::string(".visible .const .align 4 .u32 mdl_target_code_count = ")
+ to_string(target_codes.size()) + ";\n";
print_array_u32(
src, std::string("mdl_target_code_offsets"), unsigned(target_codes.size()), tc_offsets);
src += std::string(".visible .const .align 4 .u32 mdl_functions_count = ")
+ to_string(f_count) + ";\n";
print_array_func(src, std::string("mdl_functions"), f_count, function_names);
print_array_u32(src, std::string("mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src, std::string("mdl_target_code_indices"), f_count, tc_indices);
return src;
}
CUmodule build_linked_kernel(
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
FILE *file = fopen("func_array.ptx", "wt");
if (file)
{
fwrite(ptx_func_array_src.c_str(), ptx_func_array_src.size(), 1, file);
fclose(file);
}
#endif
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
mi::examples::profiling::Timing timing("PTX to SASS");
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.c_str()),
ptx_func_array_src.size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
std::cerr << "PTX linker error:\n" << error_log << std::endl;
check_cuda_success(link_result);
}
timing.stop();
std::cout << "CUDA link completed." << std::endl;
if (info_log[0])
std::cout << "Linker output:\n" << info_log << std::endl;
#ifdef DUMP_PTX
file = fopen("target_code.cubin", "wb");
if (file)
{
fwrite(linked_cubin, linked_cubin_size, 1, file);
fclose(file);
}
#endif
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(®s, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
std::cout << "Kernel uses " << regs << " registers and " << lmem << " lmem and has a size of "
<< linked_cubin_size << " bytes." << std::endl;
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif
This interface represents mutable pointers.
Definition: ipointer.h:43
Example implementation of the abstract interface mi::neuraylib::IBsdf_isotropic_data.
Definition: bsdf_isotropic_data.h:60
A scene element that stores measured BSDF data.
Definition: ibsdf_measurement.h:39
virtual const base::IInterface * get_reflection() const =0
Returns the BSDF data for the reflection.
virtual const base::IInterface * get_transmission() const =0
Returns the BSDF data for transmission.
virtual Uint32 get_layers_size() const =0
Returns the number of layers this canvas has.
virtual Uint32 get_resolution_y() const =0
Returns the resolution of the canvas in y direction.
virtual Uint32 get_resolution_x() const =0
Returns the resolution of the canvas in x direction.
virtual const ITile * get_tile(Uint32 layer=0) const =0
Returns the tile for the given layer.
This interface represents a compiled material.
Definition: icompiled_material.h:97
A constant expression.
Definition: iexpression.h:96
A direct call expression.
Definition: iexpression.h:241
@ EK_DIRECT_CALL
A direct call expression. See mi::neuraylib::IExpression_direct_call.
Definition: iexpression.h:61
@ EK_CONSTANT
A constant expression. See mi::neuraylib::IExpression_constant.
Definition: iexpression.h:55
This interface represents a function call.
Definition: ifunction_call.h:52
This interface represents a function definition.
Definition: ifunction_definition.h:44
Semantics
All known semantics of functions definitions.
Definition: ifunction_definition.h:54
virtual ITile * convert(const ITile *tile, const char *pixel_type) const =0
Converts a tile to a different pixel type.
virtual void adjust_gamma(ITile *tile, Float32 old_gamma, Float32 new_gamma) const =0
Sets the gamma value of a tile and adjusts the pixel data accordingly.
virtual IArray * create_mipmap(const ICanvas *canvas, Float32 gamma_override=0.0f) const =0
Creates a mipmap from the given canvas.
This interface represents light profiles.
Definition: ilightprofile.h:73
This interface represents a material instance.
Definition: imaterial_instance.h:34
@ CLASS_COMPILATION
Selects class compilation instead of instance compilation.
Definition: imaterial_instance.h:41
@ DEFAULT_OPTIONS
Default compilation options (e.g., instance compilation).
Definition: imaterial_instance.h:40
@ MSG_COMPILER_DAG
MDL Core DAG generator message.
Definition: imdl_execution_context.h:40
Represents target code of an MDL backend.
Definition: imdl_backend.h:783
@ SID_MATERIAL
The "::material" struct type.
Definition: itype.h:484
A value of type string.
Definition: ivalue.h:221
@ VK_STRING
A string value. See mi::neuraylib::IValue_string.
Definition: ivalue.h:48
virtual const IInterface * get_interface(const Uuid &interface_id) const =0
Acquires a const interface from another.
static const Dup_interface DUP_INTERFACE
Symbolic constant to trigger a special constructor in the Handle class.
Definition: handle.h:37
Handle<Interface> make_handle(Interface *iptr)
Returns a handle that holds the interface pointer passed in as argument.
Definition: handle.h:428
Interface * get() const
Access to the interface. Returns 0 for an invalid interface.
Definition: handle.h:294
@ MESSAGE_SEVERITY_ERROR
An error has occurred.
Definition: enums.h:35
Uint64 Size
Unsigned integral type that is large enough to hold the size of all types.
Definition: types.h:112
float Float32
32-bit float.
Definition: types.h:51
Bbox<T, DIM> operator*(const Bbox<T, DIM> &bbox, T factor)
Returns a bounding box that is a version of bbox scaled by factor, i.e., bbox.max and bbox....
Definition: bbox.h:502
virtual const char * get_texture(Size index) const =0
Returns the name of a texture resource used by the target code.
const char * path
The path from the material root to the expression that should be translated, e.g.,...
Definition: imdl_backend.h:1779
virtual Size get_bsdf_measurement_count() const =0
Returns the number of bsdf measurement resources used by the target code.
virtual Texture_shape get_texture_shape(Size index) const =0
Returns the texture shape of a given texture resource used by the target code.
virtual Size get_ro_data_segment_count() const =0
Returns the number of constant data initializers.
virtual Size get_ro_data_segment_size(Size index) const =0
Returns the size of the constant data segment at the given index.
virtual const char * get_ro_data_segment_data(Size index) const =0
Returns the data of the constant data segment at the given index.
virtual Size get_texture_count() const =0
Returns the number of texture resources used by the target code.
virtual Size get_argument_block_count() const =0
Returns the number of target argument blocks.
Texture_shape
Definition: imdl_backend.h:811
virtual const ITarget_argument_block * get_argument_block(Size index) const =0
Get a target argument block if available.
const char * base_fname
The base name of the generated functions.
Definition: imdl_backend.h:1785
virtual const ITarget_value_layout * get_argument_block_layout(Size index) const =0
Get a captured arguments block layout if available.
Sint32 return_code
A return code.
Definition: imdl_backend.h:1838
virtual const char * get_light_profile(Size index) const =0
Returns the name of a light profile resource used by the target code.
virtual const char * get_bsdf_measurement(Size index) const =0
Returns the name of a bsdf measurement resource used by the target code.
virtual Size get_light_profile_count() const =0
Returns the number of light profile resources used by the target code.
@ Texture_shape_bsdf_data
Three-dimensional texture representing a BSDF data table.
Definition: imdl_backend.h:817
@ Texture_shape_cube
Cube map texture.
Definition: imdl_backend.h:815
@ Texture_shape_3d
Three-dimensional texture.
Definition: imdl_backend.h:814
@ BSDF_SCALAR
One scalar per grid value.
Definition: ibsdf_isotropic_data.h:24
Common namespace for APIs of NVIDIA Advanced Rendering Center GmbH.
Definition: example_derivatives.dox:5
Description of target function.
Definition: imdl_backend.h:1764
Source Code Location: examples/mdl_sdk/execution_cuda/example_execution_cuda.cu
/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/execution_cuda/example_execution_cuda.cu
//
// This file contains the CUDA kernel used to evaluate the material sub-expressions.
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <math.h>
#include "texture_support_cuda.h"
// To reuse this sample code for the MDL SDK and MDL Core the corresponding namespaces are used.
// when this CUDA code is used in the context of an SDK sample.
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
// when this CUDA code is used in the context of an Core sample.
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
#ifdef ENABLE_DERIVATIVES
typedef Material_expr_function_with_derivs Mat_expr_func;
typedef Shading_state_material_with_derivs Mdl_state;
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Material_expr_function Mat_expr_func;
typedef Shading_state_material Mdl_state;
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
// Custom structure representing the resources used by the generated code of a target code object.
struct Target_code_data
{
size_t num_textures; // number of elements in the textures field
Texture *textures; // a list of Texture objects, if used
char const *ro_data_segment; // the read-only data segment, if used
};
// The number of generated MDL sub-expression functions available.
extern __constant__ unsigned int mdl_functions_count;
// The target argument block indices for the generated MDL sub-expression functions.
// Note: the original indices are incremented by one to allow us to use 0 as "not-used".
extern __constant__ unsigned int mdl_arg_block_indices[];
// The function pointers of the generated MDL sub-expression functions.
// In this example it is assumed that only expressions are added to the link unit.
// For a more complex use case, see also example df_cuda.
extern __constant__ Mat_expr_func *mdl_functions[];
// The target code indices for the generated MDL sub-expression functions.
// In contrast to the df_cuda sample, this example simply iterates over all generated expressions.
// Therefore, no target_code_indices and function_indices are passed from the host side.
// Instead, this additional array allows the mapping to target_code_index.
extern __constant__ unsigned int mdl_target_code_indices[];
// Identity matrix.
// The last row is always implied to be (0, 0, 0, 1).
__constant__ const tct_float4 identity[3] = {
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
// Calculate radical inverse with base 2.
__device__ float radinv2(unsigned int bits)
{
bits = (bits << 16) | (bits >> 16);
bits = ((bits & 0x00ff00ff) << 8) | ((bits & 0xff00ff00) >> 8);
bits = ((bits & 0x0f0f0f0f) << 4) | ((bits & 0xf0f0f0f0) >> 4);
bits = ((bits & 0x33333333) << 2) | ((bits & 0xcccccccc) >> 2);
bits = ((bits & 0x55555555) << 1) | ((bits & 0xaaaaaaaa) >> 1);
return float(bits) / float(0x100000000ULL);
}
// CUDA kernel evaluating the MDL sub-expression for one texel.
extern "C" __global__ void evaluate_mat_expr(
float3 *out_buf,
Target_code_data *tc_data_list,
char const **arg_block_list,
unsigned int width,
unsigned int height,
unsigned int num_samples)
{
// Determine x and y coordinates of texel to be evaluated and check
// whether it is out of bounds (due to block padding)
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
return;
// Calculate position and texture coordinates for a 2x2 quad around the center of the world
float step_x = 1.f / width;
float step_y = 1.f / height;
float pos_x = 2.0f * x * step_x - 1.0f; // [-1, 1)
float pos_y = 2.0f * y * step_y - 1.0f; // [-1, 1)
float tex_x = float(x) * step_x; // [0, 1)
float tex_y = float(y) * step_y; // [0, 1)
// Assign materials in a checkerboard pattern
unsigned int material_index =
((unsigned int)(tex_x * 4) ^ (unsigned int)(tex_y * 4)) % mdl_functions_count;
unsigned int tc_idx = mdl_target_code_indices[material_index];
char const *arg_block = arg_block_list[mdl_arg_block_indices[material_index]];
// Setup MDL material state (with only one texture space)
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 texture_coords[1] = {
{ { tex_x, tex_y, 0.0f }, { step_x, 0.0f, 0.0f }, { 0.0f, step_y, 0.0f } } };
#else
tct_float3 texture_coords[1] = { { tex_x, tex_y, 0.0f } };
#endif
tct_float3 texture_tangent_u[1] = { { 1.0f, 0.0f, 0.0f } };
tct_float3 texture_tangent_v[1] = { { 0.0f, 1.0f, 0.0f } };
Mdl_state mdl_state = {
/*normal=*/ { 0.0f, 0.0f, 1.0f },
/*geom_normal=*/ { 0.0f, 0.0f, 1.0f },
#ifdef ENABLE_DERIVATIVES
/*position=*/
{
{ pos_x, pos_y, 0.0f },
{ 2 * step_x, 0.0f, 0.0f },
{ 0.0f, 2 * step_y, 0.0f }
},
#else
/*position=*/ { pos_x, pos_y, 0.0f },
#endif
/*animation_time=*/ 0.0f,
/*texture_coords=*/ texture_coords,
/*tangent_u=*/ texture_tangent_u,
/*tangent_v=*/ texture_tangent_v,
/*text_results=*/ NULL,
/*ro_data_segment=*/ tc_data_list[tc_idx].ro_data_segment,
/*world_to_object=*/ identity,
/*object_to_world=*/ identity,
/*object_id=*/ 0,
/*meters_per_scene_unit=*/ 1.0f
};
Tex_handler tex_handler;
tex_handler.vtable = &TEX_VTABLE; // only required in 'vtable' mode, otherwise NULL
tex_handler.num_textures = tc_data_list[tc_idx].num_textures;
tex_handler.textures = tc_data_list[tc_idx].textures;
Resource_data res_data_pair = {
NULL, reinterpret_cast<Texture_handler_base *>(&tex_handler) };
// Super-sample the current texel with the given number of samples
float3 res = make_float3(0, 0, 0);
for (unsigned int i = 0; i < num_samples; ++i) {
// Calculate the offset for the current sample
float offs_x = float(i) / num_samples * step_x;
float offs_y = radinv2(i) * step_y;
// Update the position and the texture coordinate
#ifdef ENABLE_DERIVATIVES
mdl_state.position.val.x = pos_x + 2 * offs_x;
mdl_state.position.val.y = pos_y + 2 * offs_y;
texture_coords[0].val.x = tex_x + offs_x;
texture_coords[0].val.y = tex_y + offs_y;
#else
mdl_state.position.x = pos_x + 2 * offs_x;
mdl_state.position.y = pos_y + 2 * offs_y;
texture_coords[0].x = tex_x + offs_x;
texture_coords[0].y = tex_y + offs_y;
#endif
// Add result for current sample
float3 cur_res;
mdl_functions[material_index](&cur_res, &mdl_state, &res_data_pair, arg_block);
res.x += cur_res.x;
res.y += cur_res.y;
res.z += cur_res.z;
}
// Calculate average over all samples and apply gamma correction
res.x = powf(res.x / num_samples, 1.f / 2.2f);
res.y = powf(res.y / num_samples, 1.f / 2.2f);
res.z = powf(res.z / num_samples, 1.f / 2.2f);
// Write result to output buffer
out_buf[y * width + x] = res;
}