MDL SDK API nvidia_logo_transpbg.gif Up
Example for Compiled Distribution Functions (PTX)
[Previous] [Up] [Next]

This example shows how functions implementing BSDFs can be generated by the "PTX" backend. To illustrate the use of these functions in CUDA, it implements a small physically based renderer that computes direct lighting from an HDR environment map using BSDF evaluation and importance sampling.

New Topics

  • Generated BSDF functions
  • Changing compiled material arguments at runtime

Detailed Description

BSDF functions


The generated functions for BSDFs implement

  • An initialization function to prepare shared data per hit.
  • Evaluation of the BSDF for a given pair of incoming and outgoing directions.
  • Importance sampling of an incoming direction given an outgoing direction.
  • Probability density function (PDF) computation of generating an incoming for a given outgoing direction.

Each function takes arguments for state, resources, and material arguments that are analogous to the material expression counterpart of the earlier examples and the latter three functions further take a specific input and output parameter data. The signatures of the functions look like this:

const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
typedef void (Bsdf_sample_function) (Bsdf_sample_data *data,
const Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
typedef void (Bsdf_evaluate_function)(Bsdf_evaluate_data *data,
const Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
typedef void (Bsdf_pdf_function) (Bsdf_pdf_data *data,
const Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
void() Bsdf_evaluate_function(Bsdf_evaluate_data_base *data, Shading_state_material const *state, Resource_data const *res_data, char const *arg_block_data)
Signature of the evaluation function for material distribution functions created via mi::neuraylib::I...
Definition: target_code_types.h:1089
void() Bsdf_init_function(Shading_state_material *state, Resource_data const *res_data, char const *arg_block_data)
Signature of the initialization function for material distribution functions created via mi::neurayli...
Definition: target_code_types.h:1028
struct Shading_state_material_impl<false> Shading_state_material
The MDL material state structure.
Definition: target_code_types.h:300
void() Bsdf_sample_function(Bsdf_sample_data *data, Shading_state_material const *state, Resource_data const *res_data, char const *arg_block_data)
Signature of the importance sampling function for material distribution functions created via mi::neu...
Definition: target_code_types.h:1059
void() Bsdf_pdf_function(Bsdf_pdf_data *data, Shading_state_material const *state, Resource_data const *res_data, char const *arg_block_data)
Signature of the probability density function for material distribution functions created via mi::neu...
Definition: target_code_types.h:1119

The functions can be generated by mi::neuraylib::IMdl_backend::translate_material_df(), mi::neuraylib::IMdl_backend::translate_material(), mi::neuraylib::ILink_unit::add_material_df() or mi::neuraylib::ILink_unit::add_material(). All functions support the context option include_geometry_normal that can be specified to make the initialization function replace state->normal by the result of the expression connected to geometry.normal of the material. The default for this option is true. The final function names are specified by a base name that will suffixed by _init, _sample, _evaluate, and _pdf.

If you use mi::neuraylib::IMdl_backend::translate_material() or mi::neuraylib::ILink_unit::add_material(), you can also enable the single-init mode, by specifying the special "init" expression path as first element in the function description list. In this mode, only one init function will be generated precalculating values for all requested expressions.

BSDF evaluation and PDF computation take a pair of directions and IORs (index of refraction) as input and produce a PDF and, in the case of Bsdf_evaluate_function, the value of the BSDF. The corresponding structs are

struct Bsdf_evaluate_data {
// Input fields
float3 ior1; // IOR current medium
float3 ior2; // IOR other side
float3 k1; // outgoing direction
float3 k2; // incoming direction
// Output fields
float3 bsdf; // bsdf * dot(normal, k2)
float pdf; // pdf (non-projected hemisphere)
};
struct Bsdf_pdf_data {
// Input fields
float3 ior1; // IOR current medium
float3 ior2; // IOR other side
float3 k1; // outgoing direction
float3 k2; // incoming direction
// Output fields
float pdf; // pdf (non-projected hemisphere)
};

The handling of the (color) IOR is generally up to the renderer, in particular this includes the decision if the evaluation should happen from within an object with refracting material or from the outer side. It is possible though, to just pass in one IOR (e.g. ior1) and make the functions fill the other with the IOR of the material by marking the x component of it with MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR (e.g. ior2.x=MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR).

For importance sampling, one direction and a set of pseudo-random numbers needs to be provided. The output values include the importance sampled direction, a PDF, a weight (the value of the BSDF divided by the PDF), and the type of the BSDF that was eventually sampled. Note that this may include an absorption event, in which case no direction has been generated.

BSDF_EVENT_ABSORB = 0,
BSDF_EVENT_DIFFUSE = 1,
BSDF_EVENT_GLOSSY = 1 << 1,
BSDF_EVENT_SPECULAR = 1 << 2,
BSDF_EVENT_REFLECTION = 1 << 3,
BSDF_EVENT_TRANSMISSION = 1 << 4,
BSDF_EVENT_DIFFUSE_REFLECTION = BSDF_EVENT_DIFFUSE | BSDF_EVENT_REFLECTION,
BSDF_EVENT_DIFFUSE_TRANSMISSION = BSDF_EVENT_DIFFUSE | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_GLOSSY_REFLECTION = BSDF_EVENT_GLOSSY | BSDF_EVENT_REFLECTION,
BSDF_EVENT_GLOSSY_TRANSMISSION = BSDF_EVENT_GLOSSY | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_SPECULAR_REFLECTION = BSDF_EVENT_SPECULAR | BSDF_EVENT_REFLECTION,
BSDF_EVENT_SPECULAR_TRANSMISSION = BSDF_EVENT_SPECULAR | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_FORCE_32_BIT = 0xffffffffU
};
struct Bsdf_sample_data {
// Input fields
float3 ior1; // IOR current medium
float3 ior2; // IOR other side
float3 k1; // outgoing direction
float3 xi; // pseudo-random sample number
// Output fields
float3 k2; // incoming direction
float pdf; // pdf (non-projected hemisphere)
float3 bsdf_over_pdf; // bsdf * dot(normal, k2) / pdf
Bsdf_event_type event_type; // the type of event for the generated sample
};
Bsdf_event_type
The type of events created by BSDF importance sampling.
Definition: target_code_types.h:736

It often is the case that for a given shading point both evaluation and importance sampling need to be performed (possibly multiple times). To avoid re-computation of material expressions in each BSDF function call it is essential that the results are cached between multiple calls. This is accomplished by the initialization function Bsdf_init_function which stores computed results in an array passed by the mi::neuraylib::Shading_state_material::text_results field. The size of that array needs to be communicated to the backend using the "num_texture_results" option via mi::neuraylib::IMdl_backend::set_option(). If the storage is insufficient (e.g. for a material with a high number of material expressions), non-cached expressions are automatically recomputed when they are needed. The initialization is generally not optional, i.e. even if the text_results array size is set to zero it may still perform some initialization, in particular it will update state->normal if requested.

EDF functions


Analogous to the generation of BSDF functions, emission distribution functions (EDFs) can be generated using the corresponding signatures and data structures. For more information see the example code and run the example application using materials that contain EDFs:

df_cuda ::nvidia::sdk_examples::tutorials::example_edf
df_cuda ::nvidia::sdk_examples::tutorials::example_measured_edf

Global distributions are not supported yet and result in no emission. Hence, the generated EDFs can currently be evaluated only in tangent space.

Changing arguments of class-compiled materials at runtime


As explained in Instance-compilation and class-compilation, the resulting mi::neuraylib::ITarget_code object contains mi::neuraylib::ITarget_value_layout and mi::neuraylib::ITarget_argument_block objects for each material, when class-compiled materials are used for generating target code. Together with the corresponding mi::neuraylib::ICompiled_material, you can already get some information about the arguments:

The example uses this information to build a material editor GUI with the "Dear ImGui" framework (https://github.com/ocornut/imgui). The GUI controls are linked to the data of the target argument block using the offsets of the arguments. When ImGui reports any changes by the user, the target argument block is updated on the device for the current material.

When presenting material arguments to the user, additional information from parameter annotations may improve the user experience. To find the annotations for an argument of a compiled material, you have to look up the annotation block for a parameter of the corresponding mi::neuraylib::IFunction_definition with the same name as the argument.

Note
You will only find a parameter in a material definition with an exactly matching name for a compiled material argument, when non-struct constants were used as arguments during class-compilation. For other arguments you will get paths like "x.b" as parameter names.

In this example, the anno::hard_range annotation is used to determine the minimum and maximum values for value sliders, the anno::display_name annotation is used as a more user-friendly name for the arguments, and the anno::in_group annotation is used to group the arguments into categories.

Example Source

To compile the source code, you require CUDA, GLFW, and GLEW. For detailed instructions, please refer to the Getting Started section.

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cpp

/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/df_cuda/example_df_cuda.cpp
//
// Simple renderer using compiled BSDFs with a material parameter editor GUI.
#include <chrono>
#include <iostream>
#include <string>
#include <vector>
#include <list>
#include <map>
#include <memory>
#define _USE_MATH_DEFINES
#include <math.h>
// shared example helpers
#include "example_df_cuda.h"
#include "lpe.h"
// Enable this to dump the generated PTX code to files in the current directory.
// #define DUMP_PTX
#define OPENGL_INTEROP
#include "example_cuda_shared.h"
#include <imgui.h>
#include <imgui_impl_glfw.h>
#include <imgui_impl_opengl3.h>
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#define GL_DISPLAY_CUDA
#include "utils/gl_display.h"
#include "utils/profiling.h"
using namespace mi::examples::profiling;
#define terminate() \
do { \
glfwTerminate(); \
exit_failure(); \
} while (0)
#define WINDOW_TITLE "MDL SDK DF CUDA Example"
// Vector helper functions //
inline float length(const float3 &d)
{
return sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
}
inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f / length(d);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
inline float3 operator/(const float3& d, float s)
{
const float inv_s = 1.0f / s;
return make_float3(d.x * inv_s, d.y * inv_s, d.z * inv_s);
}
// OpenGL code //
// Initialize OpenGL and create a window with an associated OpenGL context.
static GLFWwindow *init_opengl(std::string& version_string, int res_x, int res_y)
{
// Initialize GLFW
check_success(glfwInit());
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
version_string = "#version 330 core"; // see top comments in 'imgui_impl_opengl3.cpp'
// Create an OpenGL window and a context
GLFWwindow *window = glfwCreateWindow(
res_x, res_y, WINDOW_TITLE, nullptr, nullptr);
if (!window) {
std::cerr << "Error creating OpenGL window!" << std::endl;
terminate();
}
// Attach context to window
glfwMakeContextCurrent(window);
// Initialize GLEW to get OpenGL extensions
GLenum res = glewInit();
if (res != GLEW_OK) {
std::cerr << "GLEW error: " << glewGetErrorString(res) << std::endl;
terminate();
}
// Disable VSync
glfwSwapInterval(0);
check_success(glGetError() == GL_NO_ERROR);
return window;
}
// Application logic //
// Context structure for window callback functions.
struct Window_context
{
bool mouse_event, key_event;
bool save_image;
int zoom;
int mouse_button; // button from callback event plus one (0 = no event)
int mouse_button_action; // action from mouse button callback event
int mouse_wheel_delta;
bool moving;
double move_start_x, move_start_y;
double move_dx, move_dy;
int material_index_delta;
bool save_result;
bool exposure_event;
float exposure;
};
static std::string to_string(Display_buffer_options option)
{
switch (option)
{
case DISPLAY_BUFFER_LPE: return "Selected LPE";
case DISPLAY_BUFFER_ALBEDO: return "Albedo";
case DISPLAY_BUFFER_NORMAL: return "Normal";
default: return "";
}
}
static std::string to_string(mi::neuraylib::Df_flags option)
{
switch (option)
{
case mi::neuraylib::DF_FLAGS_NONE: return "None";
case mi::neuraylib::DF_FLAGS_ALLOW_REFLECT: return "Reflect only";
case mi::neuraylib::DF_FLAGS_ALLOW_TRANSMIT: return "Transmit only";
case mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT: return "Reflect+Transmit";
default: return "";
}
}
// GLFW scroll callback
static void handle_scroll(GLFWwindow *window, double xoffset, double yoffset)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (yoffset > 0.0) {
ctx->mouse_wheel_delta = 1; ctx->mouse_event = true;
} else if (yoffset < 0.0) {
ctx->mouse_wheel_delta = -1; ctx->mouse_event = true;
}
ImGui_ImplGlfw_ScrollCallback(window, xoffset, yoffset);
}
// GLFW keyboard callback
static void handle_key(GLFWwindow *window, int key, int scancode, int action, int mods)
{
// Handle key press events
if (action == GLFW_PRESS) {
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
switch (key) {
// Escape closes the window
case GLFW_KEY_ESCAPE:
glfwSetWindowShouldClose(window, GLFW_TRUE);
break;
case GLFW_KEY_DOWN:
case GLFW_KEY_RIGHT:
case GLFW_KEY_PAGE_DOWN:
ctx->material_index_delta = 1;
ctx->key_event = true;
break;
case GLFW_KEY_UP:
case GLFW_KEY_LEFT:
case GLFW_KEY_PAGE_UP:
ctx->material_index_delta = -1;
ctx->key_event = true;
break;
case GLFW_KEY_ENTER:
ctx->save_result = true;
break;
case GLFW_KEY_KP_SUBTRACT:
ctx->exposure--;
ctx->exposure_event = true;
break;
case GLFW_KEY_KP_ADD:
ctx->exposure++;
ctx->exposure_event = true;
break;
default:
break;
}
}
ImGui_ImplGlfw_KeyCallback(window, key, scancode, action, mods);
}
// GLFW mouse button callback
static void handle_mouse_button(GLFWwindow *window, int button, int action, int mods)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
ctx->mouse_button = button + 1;
ctx->mouse_button_action = action;
ImGui_ImplGlfw_MouseButtonCallback(window, button, action, mods);
}
// GLFW mouse position callback
static void handle_mouse_pos(GLFWwindow *window, double xpos, double ypos)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->moving)
{
ctx->move_dx += xpos - ctx->move_start_x;
ctx->move_dy += ypos - ctx->move_start_y;
ctx->move_start_x = xpos;
ctx->move_start_y = ypos;
ctx->mouse_event = true;
}
}
// Resize CUDA buffers for a given resolution
static void resize_buffers(CUdeviceptr *buffer_cuda, int width, int height)
{
// Allocate CUDA buffer
if (*buffer_cuda)
check_cuda_success(cuMemFree(*buffer_cuda));
if (width == 0 || height == 0)
*buffer_cuda = 0;
else
check_cuda_success(cuMemAlloc(buffer_cuda, width * height * sizeof(float3)));
}
// Helper for create_environment()
static float build_alias_map(
const float *data,
const unsigned int size,
Env_accel *accel)
{
// create qs (normalized)
float sum = 0.0f;
for (unsigned int i = 0; i < size; ++i)
sum += data[i];
for (unsigned int i = 0; i < size; ++i)
accel[i].q = (static_cast<float>(size) * data[i] / sum);
// create partition table
unsigned int *partition_table = static_cast<unsigned int *>(
malloc(size * sizeof(unsigned int)));
unsigned int s = 0u, large = size;
for (unsigned int i = 0; i < size; ++i)
partition_table[(accel[i].q < 1.0f) ? (s++) : (--large)] = accel[i].alias = i;
// create alias map
for (s = 0; s < large && large < size; ++s)
{
const unsigned int j = partition_table[s], k = partition_table[large];
accel[j].alias = k;
accel[k].q += accel[j].q - 1.0f;
large = (accel[k].q < 1.0f) ? (large + 1u) : large;
}
free(partition_table);
return sum;
}
// Create environment map texture and acceleration data for importance sampling
static void create_environment(
cudaTextureObject_t *env_tex,
cudaArray_t *env_tex_data,
CUdeviceptr *env_accel,
uint2 *res,
const char *envmap_name)
{
// Load environment texture
transaction->create<mi::neuraylib::IImage>("Image"));
check_success(image->reset_file(envmap_name) == 0);
mi::base::Handle<const mi::neuraylib::ICanvas> canvas(image->get_canvas(0, 0, 0));
const mi::Uint32 rx = canvas->get_resolution_x();
const mi::Uint32 ry = canvas->get_resolution_y();
res->x = rx;
res->y = ry;
// Check, whether we need to convert the image
char const *image_type = image->get_type(0, 0);
if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0)
canvas = image_api->convert(canvas.get(), "Color");
// Copy the image data to a CUDA array
const cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
check_cuda_success(cudaMallocArray(env_tex_data, &channel_desc, rx, ry));
const float *pixels = static_cast<const float *>(tile->get_data());
check_cuda_success(cudaMemcpy2DToArray(
*env_tex_data, 0, 0, pixels,
rx * sizeof(float4), rx * sizeof(float4), ry, cudaMemcpyHostToDevice));
// Create a CUDA texture
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = *env_tex_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeWrap;
tex_desc.addressMode[1] = cudaAddressModeClamp; // don't sample beyond poles of env sphere
tex_desc.addressMode[2] = cudaAddressModeWrap;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
check_cuda_success(cudaCreateTextureObject(env_tex, &res_desc, &tex_desc, nullptr));
// Create importance sampling data
Env_accel *env_accel_host = static_cast<Env_accel *>(malloc(rx * ry * sizeof(Env_accel)));
float *importance_data = static_cast<float *>(malloc(rx * ry * sizeof(float)));
float cos_theta0 = 1.0f;
const float step_phi = float(2.0 * M_PI) / float(rx);
const float step_theta = float(M_PI) / float(ry);
for (unsigned int y = 0; y < ry; ++y)
{
const float theta1 = float(y + 1) * step_theta;
const float cos_theta1 = std::cos(theta1);
const float area = (cos_theta0 - cos_theta1) * step_phi;
cos_theta0 = cos_theta1;
for (unsigned int x = 0; x < rx; ++x) {
const unsigned int idx = y * rx + x;
const unsigned int idx4 = idx * 4;
importance_data[idx] =
area * std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2]));
}
}
const float inv_env_integral = 1.0f / build_alias_map(importance_data, rx * ry, env_accel_host);
free(importance_data);
for (unsigned int i = 0; i < rx * ry; ++i) {
const unsigned int idx4 = i * 4;
env_accel_host[i].pdf =
std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2])) * inv_env_integral;
}
*env_accel = gpu_mem_dup(env_accel_host, rx * ry * sizeof(Env_accel));
free(env_accel_host);
}
static void upload_lpe_state_machine(
Kernel_params& kernel_params,
LPE_state_machine& lpe_state_machine)
{
uint32_t num_trans = lpe_state_machine.get_transition_count();
uint32_t num_states = lpe_state_machine.get_state_count();
kernel_params.lpe_num_transitions = num_trans;
kernel_params.lpe_num_states = num_states;
// free old data
if (kernel_params.lpe_state_table)
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
if (kernel_params.lpe_final_mask)
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
// state table
CUdeviceptr state_table = 0;
check_cuda_success(cuMemAlloc(&state_table, num_states * num_trans * sizeof(uint32_t)));
check_cuda_success(cuMemcpyHtoD(state_table, lpe_state_machine.get_state_table().data(),
num_states * num_trans * sizeof(uint32_t)));
kernel_params.lpe_state_table = reinterpret_cast<uint32_t*>(state_table);
// final state masks
CUdeviceptr final_mask = 0;
check_cuda_success(cuMemAlloc(&final_mask, num_states * sizeof(uint32_t)));
check_cuda_success(cuMemcpyHtoD(final_mask, lpe_state_machine.get_final_state_masks().data(),
num_states * sizeof(uint32_t)));
kernel_params.lpe_final_mask = reinterpret_cast<uint32_t*>(final_mask);
// tag ID for light sources as they don't store tags in this examples
kernel_params.default_gtag = lpe_state_machine.handle_to_global_tag("");
kernel_params.point_light_gtag = lpe_state_machine.handle_to_global_tag("point_light");
kernel_params.env_gtag = lpe_state_machine.handle_to_global_tag("env");
}
// Save current result image to disk
static void save_result(
const CUdeviceptr cuda_buffer,
const unsigned int width,
const unsigned int height,
const std::string &filename,
{
image_api->create_canvas("Rgb_fp", width, height));
mi::base::Handle<mi::neuraylib::ITile> tile(canvas->get_tile());
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(data, cuda_buffer, width * height * sizeof(float3)));
mi::base::Handle<mi::IBoolean> option_force_default_gamma(factory->create<mi::IBoolean>());
option_force_default_gamma->set_value(true);
mi::base::Handle<mi::IMap> export_options(factory->create<mi::IMap>("Map<Interface>"));
export_options->insert("force_default_gamma", option_force_default_gamma.get());
mdl_impexp_api->export_canvas(filename.c_str(), canvas.get(), export_options.get());
}
// Application options
struct Options {
int cuda_device;
float gui_scale;
bool opengl;
bool use_class_compilation;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter;
#endif
bool no_aa;
bool enable_derivatives;
bool fold_ternary_on_df;
bool enable_auxiliary_output;
bool enable_pdf;
bool use_adapt_normal;
unsigned int res_x, res_y;
unsigned int iterations;
unsigned int samples_per_iteration;
unsigned int mdl_test_type;
unsigned int max_path_length;
float fov;
float exposure;
float3 cam_pos;
float3 light_pos;
float3 light_intensity;
bool enable_bsdf_flags;
mi::neuraylib::Df_flags allowed_scatter_mode;
std::string hdrfile;
float hdr_rot;
std::string outputfile;
std::vector<std::string> material_names;
// Default constructor, sets default values.
Options()
: cuda_device(0)
, gui_scale(1.0f)
, opengl(true)
, use_class_compilation(true)
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
, use_df_interpreter(false)
#endif
, no_aa(false)
, enable_derivatives(false)
, fold_ternary_on_df(false)
, enable_auxiliary_output(true)
, enable_pdf(true)
, use_adapt_normal(false)
, res_x(1024)
, res_y(1024)
, iterations(4096)
, samples_per_iteration(8)
, mdl_test_type(MDL_TEST_MIS)
, max_path_length(4)
, fov(96.0f)
, exposure(0.0f)
, cam_pos(make_float3(0, 0, 3))
, light_pos(make_float3(10, 0, 5))
, light_intensity(make_float3(0, 0, 0))
, enable_bsdf_flags(false)
, allowed_scatter_mode(mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT)
, hdrfile("nvidia/sdk_examples/resources/environment.hdr")
, hdr_rot(0.0f)
, outputfile("output.exr")
, material_names()
{}
};
// Possible enum values if any.
struct Enum_value {
std::string name;
int value;
Enum_value(const std::string &name, int value)
: name(name), value(value)
{
}
};
// Info for an enum type.
struct Enum_type_info {
std::vector<Enum_value> values;
// Adds a enum value and its integer value to the enum type info.
void add(const std::string &name, int value) {
values.push_back(Enum_value(name, value));
}
};
// Material parameter information structure.
class Param_info
{
public:
enum Param_kind
{
PK_UNKNOWN,
PK_FLOAT,
PK_FLOAT2,
PK_FLOAT3,
PK_COLOR,
PK_ARRAY,
PK_BOOL,
PK_INT,
PK_ENUM,
PK_STRING,
PK_TEXTURE,
PK_LIGHT_PROFILE,
PK_BSDF_MEASUREMENT
};
Param_info(
mi::Size index,
char const *name,
char const *display_name,
char const *group_name,
Param_kind kind,
Param_kind array_elem_kind,
mi::Size array_size,
mi::Size array_pitch,
char *data_ptr,
const Enum_type_info *enum_info = nullptr)
: m_index(index)
, m_name(name)
, m_display_name(display_name)
, m_group_name(group_name)
, m_kind(kind)
, m_array_elem_kind(array_elem_kind)
, m_array_size(array_size)
, m_array_pitch(array_pitch)
, m_data_ptr(data_ptr)
, m_range_min(-100), m_range_max(100)
, m_enum_info(enum_info)
{
}
// Get data as T&.
template<typename T>
T &data() { return *reinterpret_cast<T *>(m_data_ptr); }
// Get data as const T&.
template<typename T>
const T &data() const { return *reinterpret_cast<const T *>(m_data_ptr); }
const char * &display_name() { return m_display_name; }
const char *display_name() const { return m_display_name; }
const char * &group_name() { return m_group_name; }
const char *group_name() const { return m_group_name; }
Param_kind kind() const { return m_kind; }
Param_kind array_elem_kind() const { return m_array_elem_kind; }
mi::Size array_size() const { return m_array_size; }
mi::Size array_pitch() const { return m_array_pitch; }
float &range_min() { return m_range_min; }
float range_min() const { return m_range_min; }
float &range_max() { return m_range_max; }
float range_max() const { return m_range_max; }
const Enum_type_info *enum_info() const { return m_enum_info; }
private:
mi::Size m_index;
char const *m_name;
char const *m_display_name;
char const *m_group_name;
Param_kind m_kind;
Param_kind m_array_elem_kind;
mi::Size m_array_size;
mi::Size m_array_pitch; // the distance between two array elements
char *m_data_ptr;
float m_range_min, m_range_max;
const Enum_type_info *m_enum_info;
};
// Material information structure.
class Material_info
{
public:
Material_info(char const *name)
: m_name(name)
{}
// Add the parameter information as last entry of the corresponding group, or to the
// end of the list, if no group name is available.
void add_sorted_by_group(const Param_info &info) {
bool group_found = false;
if (info.group_name() != nullptr) {
for (std::list<Param_info>::iterator it = params().begin(); it != params().end(); ++it)
{
const bool same_group =
it->group_name() != nullptr && strcmp(it->group_name(), info.group_name()) == 0;
if (group_found && !same_group) {
m_params.insert(it, info);
return;
}
if (same_group)
group_found = true;
}
}
m_params.push_back(info);
}
// Add a new enum type to the list of used enum types.
void add_enum_type(const std::string name, std::shared_ptr<Enum_type_info> enum_info) {
enum_types[name] = enum_info;
}
// Lookup enum type info for a given enum type absolute MDL name.
const Enum_type_info *get_enum_type(const std::string name) {
Enum_type_map::const_iterator it = enum_types.find(name);
if (it != enum_types.end())
return it->second.get();
return nullptr;
}
// Get the name of the material.
char const *name() const { return m_name; }
// Get the parameters of this material.
std::list<Param_info> &params() { return m_params; }
private:
// name of the material
char const *m_name;
// parameters of the material
std::list<Param_info> m_params;
typedef std::map<std::string, std::shared_ptr<Enum_type_info> > Enum_type_map;
// used enum types of the material
Enum_type_map enum_types;
};
// Helper class to handle Resource tables of the target code.
class Resource_table
{
typedef std::map<std::string, unsigned> Resource_id_map;
public:
enum Kind {
RESOURCE_TEXTURE,
RESOURCE_LIGHT_PROFILE,
RESOURCE_BSDF_MEASUREMENT
};
// Constructor.
Resource_table(
Kind kind)
: m_max_len(0u)
{
read_resources(target_code, transaction, kind);
}
// Get the length of the longest URL in the resource table.
size_t get_max_length() const { return m_max_len; }
// Get all urls.
std::vector<std::string> const &get_urls() const { return m_urls; }
private:
void read_resources(
Kind kind)
{
m_urls.push_back("<unset>");
switch (kind) {
case RESOURCE_TEXTURE:
for (mi::Size i = 1, n = target_code->get_texture_count(); i < n; ++i) {
const char *s = target_code->get_texture(i);
transaction->access<mi::neuraylib::ITexture>(s));
char const *url = nullptr;
if (char const *img = tex->get_image()) {
transaction->access<mi::neuraylib::IImage>(img));
url = image->get_filename(0, 0);
}
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_LIGHT_PROFILE:
for (mi::Size i = 1, n = target_code->get_light_profile_count(); i < n; ++i) {
const char *s = target_code->get_light_profile(i);
char const *url = lp->get_filename();
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_BSDF_MEASUREMENT:
for (mi::Size i = 1, n = target_code->get_bsdf_measurement_count(); i < n; ++i) {
const char *s = target_code->get_bsdf_measurement(i);
char const *url = bm->get_filename();
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
}
}
private:
Resource_id_map m_resource_map;
std::vector<std::string> m_urls;
size_t m_max_len;
};
// Helper class to handle the string table of a target code.
class String_constant_table
{
typedef std::map<std::string, unsigned> String_map;
public:
// Constructor.
String_constant_table(mi::base::Handle<mi::neuraylib::ITarget_code const> target_code)
{
get_all_strings(target_code);
}
// Get the ID for a given string, return 0 if the string does not exist in the table.
unsigned get_id_for_string(const char *name) {
String_map::const_iterator it(m_string_constants_map.find(name));
if (it != m_string_constants_map.end())
return it->second;
// the user adds a string that is NOT in the code and we have not seen so far, add it
// and assign a new id
unsigned n_id = unsigned(m_string_constants_map.size() + 1);
m_string_constants_map[name] = n_id;
m_strings.reserve((n_id + 63) & ~63);
m_strings.push_back(name);
size_t l = strlen(name);
if (l > m_max_len)
m_max_len = l;
return n_id;
}
// Get the length of the longest string in the string constant table.
size_t get_max_length() const { return m_max_len; }
// Get the string for a given ID, or nullptr if this ID does not exist.
const char *get_string(unsigned id) {
if (id == 0 || id - 1 >= m_strings.size())
return nullptr;
return m_strings[id - 1].c_str();
}
private:
// Get all string constants used inside a target code and their maximum length.
void get_all_strings(
{
m_max_len = 0;
// ignore the 0, it is the "Not-a-known-string" entry
m_strings.reserve(target_code->get_string_constant_count());
for (mi::Size i = 1, n = target_code->get_string_constant_count(); i < n; ++i) {
const char *s = target_code->get_string_constant(i);
size_t l = strlen(s);
if (l > m_max_len)
m_max_len = l;
m_string_constants_map[s] = (unsigned)i;
m_strings.push_back(s);
}
}
private:
String_map m_string_constants_map;
std::vector<std::string> m_strings;
size_t m_max_len;
};
// Update the camera kernel parameters.
static void update_camera(
Kernel_params &kernel_params,
double phi,
double theta,
float base_dist,
int zoom)
{
kernel_params.cam_dir.x = float(-sin(phi) * sin(theta));
kernel_params.cam_dir.y = float(-cos(theta));
kernel_params.cam_dir.z = float(-cos(phi) * sin(theta));
kernel_params.cam_right.x = float(cos(phi));
kernel_params.cam_right.y = 0.0f;
kernel_params.cam_right.z = float(-sin(phi));
kernel_params.cam_up.x = float(-sin(phi) * cos(theta));
kernel_params.cam_up.y = float(sin(theta));
kernel_params.cam_up.z = float(-cos(phi) * cos(theta));
const float dist = float(base_dist * pow(0.95, double(zoom)));
kernel_params.cam_pos.x = -kernel_params.cam_dir.x * dist;
kernel_params.cam_pos.y = -kernel_params.cam_dir.y * dist;
kernel_params.cam_pos.z = -kernel_params.cam_dir.z * dist;
}
// Add a combobox for the given resource parameter to the GUI
static bool handle_resource(
Param_info &param,
Resource_table const &res_table)
{
bool changed = false;
std::vector<std::string> const &urls = res_table.get_urls();
int id = param.data<int>();
std::string cur_url = urls[id];
if (ImGui::BeginCombo(param.display_name(), cur_url.c_str())) {
for (size_t i = 0, n = urls.size(); i < n; ++i) {
const std::string &name = urls[i];
bool is_selected = (cur_url == name);
if (ImGui::Selectable(name.c_str(), is_selected)) {
param.data<int>() = int(i);
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
return changed;
}
static void launch_subframe(
mi::examples::mdl::GL_display* gl_display,
int width,
int height,
CUfunction cuda_function,
Kernel_params *kernel_params)
{
// Map diplay buffer, if present
if (gl_display) {
kernel_params->display_buffer =
reinterpret_cast<unsigned int*>(gl_display->map(0));
}
// Launch kernel
dim3 threads_per_block(16, 16);
dim3 num_blocks((width + 15) / 16, (height + 15) / 16);
void* params[] = { kernel_params };
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, params, nullptr));
kernel_params->iteration_start += kernel_params->iteration_num;
// Unmap again, if necessary
if (gl_display) {
gl_display->unmap(0);
}
// Make sure, any debug::print()s are written to the console
check_cuda_success(cuStreamSynchronize(0));
}
// Progressively render scene
static void render_scene(
const Options &options,
Material_compiler::Material_definition_list const &material_defs,
Material_compiler::Compiled_material_list const &compiled_materials,
std::vector<size_t> const &arg_block_indices,
std::vector<Df_cuda_material> const &material_bundle,
LPE_state_machine &lpe_state_machine)
{
Window_context window_context;
memset(&window_context, 0, sizeof(Window_context));
mi::examples::mdl::GL_display *gl_display = nullptr;
GLFWwindow *window = nullptr;
int width = -1;
int height = -1;
if (options.opengl) {
// Init OpenGL window
std::string version_string;
window = init_opengl(version_string, int(options.res_x), int(options.res_y));
glfwSetWindowUserPointer(window, &window_context);
glfwSetKeyCallback(window, handle_key);
glfwSetScrollCallback(window, handle_scroll);
glfwSetCursorPosCallback(window, handle_mouse_pos);
glfwSetMouseButtonCallback(window, handle_mouse_button);
glfwSetCharCallback(window, ImGui_ImplGlfw_CharCallback);
IMGUI_CHECKVERSION();
ImGui::CreateContext();
ImGui_ImplGlfw_InitForOpenGL(window, false);
ImGui_ImplOpenGL3_Init(version_string.c_str());
ImGui::GetIO().IniFilename = nullptr; // disable creating imgui.ini
ImGui::StyleColorsDark();
ImGui::GetStyle().Alpha = 0.7f;
ImGui::GetStyle().ScaleAllSizes(options.gui_scale);
gl_display = new mi::examples::mdl::GL_display(int(options.res_x), int(options.res_y));
}
// Initialize CUDA
CUcontext cuda_context = init_cuda(options.cuda_device, options.opengl);
CUdeviceptr accum_buffer = 0;
CUdeviceptr aux_albedo_buffer = 0; // buffer for auxiliary output
CUdeviceptr aux_normal_buffer = 0; //
if (!options.opengl) {
width = options.res_x;
height = options.res_y;
check_cuda_success(cuMemAlloc(&accum_buffer, width * height * sizeof(float3)));
check_cuda_success(cuMemAlloc(&aux_albedo_buffer, width * height * sizeof(float3)));
check_cuda_success(cuMemAlloc(&aux_normal_buffer, width * height * sizeof(float3)));
}
// Setup initial CUDA kernel parameters
Kernel_params kernel_params;
memset(&kernel_params, 0, sizeof(Kernel_params));
kernel_params.cam_focal = 1.0f / tanf(options.fov / 2 * float(2 * M_PI / 360));
kernel_params.light_pos = options.light_pos;
kernel_params.light_intensity = fmaxf(
options.light_intensity.x, fmaxf(options.light_intensity.y, options.light_intensity.z));
kernel_params.light_color = kernel_params.light_intensity > 0.0f
? options.light_intensity / kernel_params.light_intensity
: make_float3(1.0f, 0.9f, 0.5f);
kernel_params.env_intensity = 1.0f;
kernel_params.iteration_start = 0;
kernel_params.iteration_num = options.samples_per_iteration;
kernel_params.mdl_test_type = options.mdl_test_type;
kernel_params.max_path_length = options.max_path_length;
kernel_params.exposure_scale = powf(2.0f, options.exposure);
kernel_params.disable_aa = options.no_aa;
kernel_params.bsdf_data_flags = options.allowed_scatter_mode;
kernel_params.use_derivatives = options.enable_derivatives;
kernel_params.enable_auxiliary_output = options.enable_auxiliary_output;
kernel_params.display_buffer_index = 0;
kernel_params.lpe_ouput_expression = 0;
kernel_params.lpe_state_table = nullptr;
kernel_params.lpe_final_mask = nullptr;
kernel_params.current_material = 0;
kernel_params.geometry = material_bundle[kernel_params.current_material].contains_hair_bsdf ?
GT_HAIR : GT_SPHERE;
// Setup camera
float base_dist = length(options.cam_pos);
double theta, phi;
{
const float3 inv_dir = normalize(options.cam_pos);
phi = atan2(inv_dir.x, inv_dir.z);
theta = acos(inv_dir.y);
}
update_camera(kernel_params, phi, theta, base_dist, window_context.zoom);
// Build the full CUDA kernel with all the generated code
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > target_codes;
target_codes.push_back(target_code);
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_df_cuda_derivatives.ptx" : "example_df_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(mi::examples::io::get_executable_folder() + "/" + ptx_name).c_str(),
"render_scene_kernel",
&cuda_function);
// copy materials of the scene to the device
CUdeviceptr material_buffer = 0;
check_cuda_success(cuMemAlloc(&material_buffer,
material_bundle.size() * sizeof(Df_cuda_material)));
check_cuda_success(cuMemcpyHtoD(material_buffer, material_bundle.data(),
material_bundle.size() * sizeof(Df_cuda_material)));
kernel_params.material_buffer = reinterpret_cast<Df_cuda_material*>(material_buffer);
// Setup environment map and acceleration
CUdeviceptr env_accel;
cudaArray_t env_tex_data;
create_environment(
&kernel_params.env_tex, &env_tex_data, &env_accel, &kernel_params.env_size, transaction,
image_api, options.hdrfile.c_str());
kernel_params.env_accel = reinterpret_cast<Env_accel *>(env_accel);
kernel_params.env_rotation = options.hdr_rot / 180.0f * float(M_PI);
// Setup GPU runtime of the LPE state machine
upload_lpe_state_machine(kernel_params, lpe_state_machine);
// Setup file name for nogl mode
std::string next_filename_base;
std::string filename_base, filename_ext;
size_t dot_pos = options.outputfile.rfind('.');
if (dot_pos == std::string::npos) {
filename_base = options.outputfile;
} else {
filename_base = options.outputfile.substr(0, dot_pos);
filename_ext = options.outputfile.substr(dot_pos);
}
if (options.material_names.size() > 1)
next_filename_base = filename_base + "-0";
else
next_filename_base = filename_base;
// Scope for material context resources
{
// Prepare the needed data of all target codes for the GPU
Material_gpu_context material_gpu_context(options.enable_derivatives);
if (!material_gpu_context.prepare_target_code_data(
transaction.get(), image_api.get(), target_code.get(), arg_block_indices))
terminate();
kernel_params.tc_data = reinterpret_cast<Target_code_data *>(
material_gpu_context.get_device_target_code_data_list());
kernel_params.arg_block_list = reinterpret_cast<char const **>(
material_gpu_context.get_device_target_argument_block_list());
String_constant_table constant_table(target_code);
Resource_table texture_table(target_code, transaction, Resource_table::RESOURCE_TEXTURE);
Resource_table lp_table(target_code, transaction, Resource_table::RESOURCE_LIGHT_PROFILE);
Resource_table bm_table(
target_code, transaction, Resource_table::RESOURCE_BSDF_MEASUREMENT);
// Collect information about the arguments of the compiled materials
std::vector<Material_info> mat_infos;
for (size_t i = 0, num_mats = compiled_materials.size(); i < num_mats; ++i) {
// Get the compiled material and the parameter annotations
mi::neuraylib::ICompiled_material const *cur_mat = compiled_materials[i].get();
mi::neuraylib::IFunction_definition const *cur_def = material_defs[i].get();
// Get the target argument block and its layout
size_t arg_block_index = material_gpu_context.get_bsdf_argument_block_index(i);
material_gpu_context.get_argument_block_layout(arg_block_index));
material_gpu_context.get_argument_block(arg_block_index));
char *arg_block_data = arg_block != nullptr ? arg_block->get_data() : nullptr;
Material_info mat_info(cur_def->get_mdl_name());
for (mi::Size j = 0, num_params = cur_mat->get_parameter_count(); j < num_params; ++j) {
const char *name = cur_mat->get_parameter_name(j);
if (name == nullptr) continue;
// Determine the type of the argument
mi::neuraylib::IValue::Kind kind = arg->get_kind();
Param_info::Param_kind param_kind = Param_info::PK_UNKNOWN;
Param_info::Param_kind param_array_elem_kind = Param_info::PK_UNKNOWN;
mi::Size param_array_size = 0;
mi::Size param_array_pitch = 0;
const Enum_type_info *enum_type = nullptr;
switch (kind) {
param_kind = Param_info::PK_FLOAT;
break;
param_kind = Param_info::PK_COLOR;
break;
param_kind = Param_info::PK_BOOL;
break;
param_kind = Param_info::PK_INT;
break;
{
arg.get_interface<mi::neuraylib::IValue_vector const>());
val->get_type());
val_type->get_element_type());
if (elem_type->get_kind() == mi::neuraylib::IType::TK_FLOAT) {
switch (val_type->get_size()) {
case 2: param_kind = Param_info::PK_FLOAT2; break;
case 3: param_kind = Param_info::PK_FLOAT3; break;
default: assert(false && "Vector Size invalid or unhandled.");
}
}
}
break;
{
arg.get_interface<mi::neuraylib::IValue_array const>());
val->get_type());
val_type->get_element_type());
// we currently only support arrays of some values
switch (elem_type->get_kind()) {
param_array_elem_kind = Param_info::PK_FLOAT;
break;
param_array_elem_kind = Param_info::PK_COLOR;
break;
param_array_elem_kind = Param_info::PK_BOOL;
break;
param_array_elem_kind = Param_info::PK_INT;
break;
{
elem_type.get_interface<
val_type->get_element_type());
if (velem_type->get_kind() == mi::neuraylib::IType::TK_FLOAT) {
switch (val_type->get_size()) {
case 2:
param_array_elem_kind = Param_info::PK_FLOAT2;
break;
case 3:
param_array_elem_kind = Param_info::PK_FLOAT3;
break;
default:
assert(false && "Vector Size invalid or unhandled.");
}
}
}
break;
default:
assert(false && "Array element type invalid or unhandled.");
}
if (param_array_elem_kind != Param_info::PK_UNKNOWN) {
param_kind = Param_info::PK_ARRAY;
param_array_size = val_type->get_size();
// determine pitch of array if there are at least two elements
if (param_array_size > 1) {
layout->get_nested_state(j));
layout->get_nested_state(1, array_state));
mi::Size param_size;
mi::Size start_offset = layout->get_layout(
kind, param_size, array_state);
mi::Size next_offset = layout->get_layout(
kind, param_size, next_elem_state);
param_array_pitch = next_offset - start_offset;
}
}
}
break;
{
arg.get_interface<mi::neuraylib::IValue_enum const>());
val->get_type());
// prepare info for this enum type if not seen so far
const Enum_type_info *info = mat_info.get_enum_type(val_type->get_symbol());
if (info == nullptr) {
std::shared_ptr<Enum_type_info> p(new Enum_type_info());
for (mi::Size i = 0, n = val_type->get_size(); i < n; ++i) {
p->add(val_type->get_value_name(i), val_type->get_value_code(i));
}
mat_info.add_enum_type(val_type->get_symbol(), p);
info = p.get();
}
enum_type = info;
param_kind = Param_info::PK_ENUM;
}
break;
param_kind = Param_info::PK_STRING;
break;
param_kind = Param_info::PK_TEXTURE;
break;
param_kind = Param_info::PK_LIGHT_PROFILE;
break;
param_kind = Param_info::PK_BSDF_MEASUREMENT;
break;
default:
// Unsupported? -> skip
continue;
}
// Get the offset of the argument within the target argument block
mi::neuraylib::Target_value_layout_state state(layout->get_nested_state(j));
mi::Size param_size;
mi::Size offset = layout->get_layout(kind2, param_size, state);
check_success(kind == kind2);
Param_info param_info(
j,
name,
name,
/*group_name=*/ nullptr,
param_kind,
param_array_elem_kind,
param_array_size,
param_array_pitch,
arg_block_data + offset,
enum_type);
// Check for annotation info
anno_list->get_annotation_block(name));
if (anno_block) {
mi::neuraylib::Annotation_wrapper annos(anno_block.get());
mi::Size anno_index =
annos.get_annotation_index("::anno::soft_range(float,float)");
if (anno_index == mi::Size(-1)) {
anno_index = annos.get_annotation_index("::anno::hard_range(float,float)");
}
if (anno_index != mi::Size(-1)) {
annos.get_annotation_param_value(anno_index, 0, param_info.range_min());
annos.get_annotation_param_value(anno_index, 1, param_info.range_max());
}
anno_index = annos.get_annotation_index("::anno::display_name(string)");
if (anno_index != mi::Size(-1)) {
annos.get_annotation_param_value(anno_index, 0, param_info.display_name());
}
anno_index = annos.get_annotation_index("::anno::in_group(string)");
if (anno_index != mi::Size(-1)) {
annos.get_annotation_param_value(anno_index, 0, param_info.group_name());
}
}
mat_info.add_sorted_by_group(param_info);
}
mat_infos.push_back(mat_info);
}
if (!options.opengl)
{
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.accum_buffer = reinterpret_cast<float3*>(accum_buffer);
kernel_params.albedo_buffer = reinterpret_cast<float3*>(aux_albedo_buffer);
kernel_params.normal_buffer = reinterpret_cast<float3*>(aux_normal_buffer);
// render images for all materials
while (kernel_params.current_material < material_bundle.size()) {
// render scene
{
Timing timing("rendering");
while (kernel_params.iteration_start < options.iterations) {
launch_subframe(gl_display, width, height, cuda_function, &kernel_params);
}
}
save_result(
accum_buffer, width, height,
next_filename_base + filename_ext,
factory, image_api, mdl_impexp_api);
save_result(
aux_albedo_buffer, width, height,
next_filename_base + "_albedo" + filename_ext,
factory, image_api, mdl_impexp_api);
save_result(
aux_normal_buffer, width, height,
next_filename_base + "_normal" + filename_ext,
factory, image_api, mdl_impexp_api);
// All materials have been rendered? -> done
if (kernel_params.current_material + 1 >= material_bundle.size())
break;
if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
kernel_params.geometry = GT_SPHERE;
else
kernel_params.geometry = GT_HAIR;
// Start new image with next material
kernel_params.iteration_start = 0;
++kernel_params.current_material;
next_filename_base =
filename_base + "-" + to_string(kernel_params.current_material);
}
} else {
std::chrono::duration<double> state_update_time( 0.0 );
std::chrono::duration<double> render_time( 0.0 );
std::chrono::duration<double> display_time( 0.0 );
char stats_text[128];
int last_update_frames = -1;
auto last_update_time = std::chrono::steady_clock::now();
const std::chrono::duration<double> update_min_interval( 0.5 );
// Main render loop
while (true)
{
std::chrono::time_point<std::chrono::steady_clock> t0 =
std::chrono::steady_clock::now();
// Check for termination
if (glfwWindowShouldClose(window))
break;
// Poll for events and process them
glfwPollEvents();
// Check if buffers need to be resized
int nwidth, nheight;
glfwGetFramebufferSize(window, &nwidth, &nheight);
if (nwidth != width || nheight != height)
{
width = nwidth;
height = nheight;
gl_display->resize(width, height);
resize_buffers(
&accum_buffer, width, height);
kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
resize_buffers(&aux_albedo_buffer, width, height);
kernel_params.albedo_buffer = reinterpret_cast<float3 *>(aux_albedo_buffer);
resize_buffers(&aux_normal_buffer, width, height);
kernel_params.normal_buffer = reinterpret_cast<float3 *>(aux_normal_buffer);
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.iteration_start = 0;
}
// Don't render anything, if minimized
if (width == 0 || height == 0) {
// Wait until something happens
glfwWaitEvents();
continue;
}
ImGui_ImplOpenGL3_NewFrame();
ImGui_ImplGlfw_NewFrame();
ImGui::NewFrame();
// Create material parameter editor window
ImGui::SetNextWindowPos(ImVec2(10, 100), ImGuiCond_FirstUseEver);
ImGui::SetNextWindowSize(
ImVec2(360 * options.gui_scale, 600 * options.gui_scale),
ImGuiCond_FirstUseEver);
ImGui::Begin("Settings");
ImGui::SetWindowFontScale(options.gui_scale);
ImGui::PushItemWidth(-200 * options.gui_scale);
if (options.use_class_compilation)
ImGui::Text("CTRL + Click to manually enter numbers");
else
ImGui::Text("Parameter editing requires class compilation.");
if (kernel_params.enable_auxiliary_output)
{
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Display options");
ImGui::Separator();
std::string current_lpe_name = lpe_state_machine.get_expression_name(
kernel_params.lpe_ouput_expression);
if (ImGui::BeginCombo("LPE", current_lpe_name.c_str()))
{
for (uint32_t i = 0; i < lpe_state_machine.get_expression_count(); ++i)
{
const char* name = lpe_state_machine.get_expression_name(i);
bool is_selected = (i == kernel_params.lpe_ouput_expression);
if (ImGui::Selectable(name, is_selected))
{
kernel_params.lpe_ouput_expression = i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
std::string current_display_buffer =
to_string((Display_buffer_options) kernel_params.display_buffer_index);
if (ImGui::BeginCombo("buffer", current_display_buffer.c_str()))
{
for (unsigned i = 0; i < (unsigned) DISPLAY_BUFFER_COUNT; ++i)
{
const std::string &name = to_string((Display_buffer_options) i);
bool is_selected = (current_display_buffer == name);
if (ImGui::Selectable(name.c_str(), is_selected))
{
kernel_params.display_buffer_index = i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
if (options.enable_bsdf_flags)
{
std::string current_allow_mode = to_string(kernel_params.bsdf_data_flags);
if (ImGui::BeginCombo("BSDF flags", current_allow_mode.c_str()))
{
for (unsigned i = 0;
i <= (unsigned) mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT; ++i)
{
const std::string &name = to_string((mi::neuraylib::Df_flags) i);
bool is_selected = (current_allow_mode == name);
if (ImGui::Selectable(name.c_str(), is_selected))
{
kernel_params.bsdf_data_flags = (mi::neuraylib::Df_flags) i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Light parameters");
ImGui::Separator();
if (ImGui::ColorEdit3("Point Light Color", &kernel_params.light_color.x))
kernel_params.iteration_start = 0;
if (ImGui::SliderFloat("Point Light Intensity",
&kernel_params.light_intensity, 0.0f, 50000.0f))
kernel_params.iteration_start = 0;
if (ImGui::SliderFloat("Environment Intensity Scale",
&kernel_params.env_intensity, 0.0f, 10.0f))
kernel_params.iteration_start = 0;
float env_rot_degree = kernel_params.env_rotation / float(M_PI) * 180.0f;
if (ImGui::SliderFloat("Environment Rotation",
&env_rot_degree, 0.0f, 360.0f))
{
// wrap in case of negative input
// we don't want fmodf behavior for negative values
env_rot_degree -= floorf(env_rot_degree / 360.0f) * 360.f;
kernel_params.env_rotation = fmodf(env_rot_degree, 360.0f) / 180.0f * float(M_PI);
kernel_params.iteration_start = 0;
}
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Material parameters");
ImGui::Separator();
Material_info &mat_info = mat_infos[
material_bundle[kernel_params.current_material].compiled_material_index];
// Print material name
ImGui::Text("%s", mat_info.name());
bool changed = false;
const char *group_name = nullptr;
int id = 0;
for (std::list<Param_info>::iterator it = mat_info.params().begin(),
end = mat_info.params().end(); it != end; ++it, ++id)
{
Param_info &param = *it;
// Ensure unique ID even for parameters with same display names
ImGui::PushID(id);
// Group name changed? -> Start new group with new header
if ((!param.group_name() != !group_name) ||
(param.group_name() &&
(!group_name || strcmp(group_name, param.group_name()) != 0)))
{
ImGui::Separator();
if (param.group_name() != nullptr)
ImGui::Text("%s", param.group_name());
group_name = param.group_name();
}
// Choose proper edit control depending on the parameter kind
switch (param.kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
param.display_name(),
&param.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
param.display_name(),
&param.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
param.display_name(),
&param.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
param.display_name(),
&param.data<float>());
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
&param.data<bool>());
break;
case Param_info::PK_INT:
changed |= ImGui::SliderInt(
param.display_name(),
&param.data<int>(),
int(param.range_min()),
int(param.range_max()));
break;
case Param_info::PK_ARRAY:
{
ImGui::Text("%s", param.display_name());
ImGui::Indent(16.0f * options.gui_scale);
char *ptr = &param.data<char>();
for (mi::Size i = 0, n = param.array_size(); i < n; ++i) {
std::string idx_str = to_string(i);
switch (param.array_elem_kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
idx_str.c_str(),
reinterpret_cast<float *>(ptr));
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
reinterpret_cast<bool *>(ptr));
break;
case Param_info::PK_INT:
changed |= ImGui::SliderInt(
param.display_name(),
reinterpret_cast<int *>(ptr),
int(param.range_min()),
int(param.range_max()));
break;
default:
assert(false && "Array element type invalid or unhandled.");
}
ptr += param.array_pitch();
}
ImGui::Unindent(16.0f * options.gui_scale);
}
break;
case Param_info::PK_ENUM:
{
int value = param.data<int>();
std::string curr_value;
const Enum_type_info *info = param.enum_info();
for (size_t i = 0, n = info->values.size(); i < n; ++i) {
if (info->values[i].value == value) {
curr_value = info->values[i].name;
break;
}
}
if (ImGui::BeginCombo(param.display_name(), curr_value.c_str())) {
for (size_t i = 0, n = info->values.size(); i < n; ++i) {
const std::string &name = info->values[i].name;
bool is_selected = (curr_value == name);
if (ImGui::Selectable(
info->values[i].name.c_str(), is_selected)) {
param.data<int>() = info->values[i].value;
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
break;
case Param_info::PK_STRING:
{
std::vector<char> buf;
size_t max_len = constant_table.get_max_length();
max_len = max_len > 63 ? max_len + 1 : 64;
buf.resize(max_len);
// fill the current value
unsigned curr_index = param.data<unsigned>();
const char *opt = constant_table.get_string(curr_index);
strcpy(buf.data(), opt != nullptr ? opt : "");
if (ImGui::InputText(
param.display_name(),
buf.data(), buf.size(),
ImGuiInputTextFlags_EnterReturnsTrue))
{
unsigned id = constant_table.get_id_for_string(buf.data());
param.data<unsigned>() = id;
changed = true;
}
}
break;
case Param_info::PK_TEXTURE:
changed |= handle_resource(param, texture_table);
break;
case Param_info::PK_LIGHT_PROFILE:
changed |= handle_resource(param, lp_table);
break;
case Param_info::PK_BSDF_MEASUREMENT:
changed |= handle_resource(param, bm_table);
break;
case Param_info::PK_UNKNOWN:
break;
}
ImGui::PopID();
}
if (options.enable_derivatives) {
ImGui::Separator();
bool b = kernel_params.use_derivatives != 0;
if (ImGui::Checkbox("Use derivatives", &b)) {
kernel_params.iteration_start = 0;
kernel_params.use_derivatives = b;
}
}
ImGui::PopItemWidth();
ImGui::End();
// If any material argument changed, update the target argument block on the device
if (changed) {
material_gpu_context.update_device_argument_block(
material_bundle[kernel_params.current_material].argument_block_index);
kernel_params.iteration_start = 0;
}
// Handle events
Window_context *ctx =
static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->save_result && !ImGui::GetIO().WantCaptureKeyboard) {
save_result(
accum_buffer,
width, height,
options.outputfile,
factory, image_api, mdl_impexp_api);
save_result(
aux_albedo_buffer,
width, height,
filename_base + "_albedo" + filename_ext,
factory, image_api, mdl_impexp_api);
save_result(
aux_normal_buffer,
width, height,
filename_base + "_normal" + filename_ext,
factory, image_api, mdl_impexp_api);
}
if (ctx->exposure_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.exposure_scale = powf(2.0f, ctx->exposure);
}
if (ctx->key_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.iteration_start = 0;
// Update change material
const unsigned num_materials = unsigned(material_bundle.size());
kernel_params.current_material = (kernel_params.current_material +
ctx->material_index_delta + num_materials) % num_materials;
ctx->material_index_delta = 0;
if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
kernel_params.geometry = GT_SPHERE;
else
kernel_params.geometry = GT_HAIR;
}
if (ctx->mouse_button - 1 == GLFW_MOUSE_BUTTON_LEFT) {
// Only accept button press when not hovering GUI window
if (ctx->mouse_button_action == GLFW_PRESS &&
!ImGui::GetIO().WantCaptureMouse) {
ctx->moving = true;
glfwGetCursorPos(window, &ctx->move_start_x, &ctx->move_start_y);
}
else
ctx->moving = false;
}
if (ctx->mouse_wheel_delta && !ImGui::GetIO().WantCaptureMouse) {
ctx->zoom += ctx->mouse_wheel_delta;
}
if (ctx->mouse_event && !ImGui::GetIO().WantCaptureMouse) {
kernel_params.iteration_start = 0;
// Update camera
phi -= ctx->move_dx * 0.001 * M_PI;
theta -= ctx->move_dy * 0.001 * M_PI;
theta = std::max(theta, 0.00 * M_PI);
theta = std::min(theta, 1.00 * M_PI);
ctx->move_dx = ctx->move_dy = 0.0;
update_camera(kernel_params, phi, theta, base_dist, ctx->zoom);
}
// Clear all events
ctx->save_result = false;
ctx->key_event = false;
ctx->mouse_event = false;
ctx->exposure_event = false;
ctx->mouse_wheel_delta = 0;
ctx->mouse_button = 0;
auto t1 = std::chrono::steady_clock::now();
state_update_time += t1 - t0;
t0 = t1;
launch_subframe(gl_display, width, height, cuda_function, &kernel_params);
t1 = std::chrono::steady_clock::now();
render_time += t1 - t0;
t0 = t1;
// Render GL buffer
gl_display->update_display();
t1 = std::chrono::steady_clock::now();
display_time += t1 - t0;
// Render stats window
ImGui::SetNextWindowPos(ImVec2(10, 10));
ImGui::Begin("##notitle", nullptr,
ImGuiWindowFlags_NoDecoration |
ImGuiWindowFlags_AlwaysAutoResize |
ImGuiWindowFlags_NoSavedSettings |
ImGuiWindowFlags_NoFocusOnAppearing |
ImGuiWindowFlags_NoNav);
// Update stats only every 0.5s
++last_update_frames;
if (t1 - last_update_time > update_min_interval || last_update_frames == 0) {
typedef std::chrono::duration<double, std::milli> durationMs;
snprintf(stats_text, sizeof(stats_text),
"%5.1f fps\n\n"
"state update: %8.1f ms\n"
"render: %8.1f ms\n"
"display: %8.1f ms\n",
last_update_frames / std::chrono::duration<double>(
t1 - last_update_time).count(),
(durationMs(state_update_time) / last_update_frames).count(),
(durationMs(render_time) / last_update_frames).count(),
(durationMs(display_time) / last_update_frames).count());
last_update_time = t1;
last_update_frames = 0;
state_update_time = render_time = display_time =
std::chrono::duration<double>::zero();
}
ImGui::TextUnformatted(stats_text);
ImGui::End();
// Show the GUI
ImGui::Render();
ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
// Swap front and back buffers
glfwSwapBuffers(window);
}
}
}
// Cleanup CUDA
check_cuda_success(cudaDestroyTextureObject(kernel_params.env_tex));
check_cuda_success(cudaFreeArray(env_tex_data));
check_cuda_success(cuMemFree(env_accel));
check_cuda_success(cuMemFree(accum_buffer));
check_cuda_success(cuMemFree(aux_albedo_buffer));
check_cuda_success(cuMemFree(aux_normal_buffer));
check_cuda_success(cuMemFree(material_buffer));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
check_cuda_success(cuModuleUnload(cuda_module));
uninit_cuda(cuda_context);
// Cleanup OpenGL
if (options.opengl) {
delete gl_display;
gl_display = nullptr;
ImGui_ImplOpenGL3_Shutdown();
ImGui_ImplGlfw_Shutdown();
ImGui::DestroyContext();
glfwDestroyWindow(window);
glfwTerminate();
}
}
// Returns true, if the string str starts with the given prefix, false otherwise.
bool starts_with(std::string const &str, std::string const &prefix)
{
return str.size() >= prefix.size() && str.compare(0, prefix.size(), prefix) == 0;
}
// Create application material representation for use in our CUDA kernel
Df_cuda_material create_cuda_material(
const mi::neuraylib::ICompiled_material* compiled_material,
size_t target_code_index,
size_t compiled_material_index,
std::vector<mi::neuraylib::Target_function_description> const& descs,
bool use_hair_bsdf)
{
Df_cuda_material mat;
// shared by all generated functions of the same material
// used here to alter the materials parameter set
mat.compiled_material_index = static_cast<unsigned int>(compiled_material_index);
// Note: the same argument_block_index is filled into all function descriptions of a
// material, if any function uses it
mat.argument_block_index = static_cast<unsigned int>(descs[0].argument_block_index);
mat.init.x = static_cast<unsigned int>(target_code_index);
mat.init.y = static_cast<unsigned int>(descs[0].function_index);
if (!use_hair_bsdf)
{
// identify the BSDF function by target_code_index (i'th link unit)
// and the function_index inside this target_code.
// same for the EDF and the intensity expression.
// has material a constant thin_walled property?
compiled_material->lookup_sub_expression("thin_walled"));
bool has_constant_thin_walled = false;
bool is_thin_walled = false;
if (thin_walled->get_kind() == mi::neuraylib::IExpression::EK_CONSTANT)
{
thin_walled->get_interface<mi::neuraylib::IExpression_constant const>());
thin_walled_const->get_value<mi::neuraylib::IValue_bool>());
has_constant_thin_walled = true;
is_thin_walled = thin_walled_bool->get_value();
}
// back faces could be different for thin walled materials
bool need_backface_bsdf = false;
bool need_backface_edf = false;
bool need_backface_emission_intensity = false;
if (!has_constant_thin_walled || is_thin_walled)
{
// first, backfaces dfs are only considered for thin_walled materials
// second, we only need to generate new code if surface and backface are different
need_backface_bsdf =
need_backface_edf =
need_backface_emission_intensity =
// third, either the bsdf or the edf need to be non-default (black)
compiled_material->lookup_sub_expression("backface.scattering"));
compiled_material->lookup_sub_expression("backface.emission.emission"));
if (scattering_expr->get_kind() == mi::neuraylib::IExpression::EK_CONSTANT &&
emission_expr->get_kind() == mi::neuraylib::IExpression::EK_CONSTANT)
{
scattering_expr->get_interface<mi::neuraylib::IExpression_constant>());
scattering_expr_constant->get_value());
emission_expr->get_interface<mi::neuraylib::IExpression_constant>());
emission_expr_constant->get_value());
if (scattering_value->get_kind() == mi::neuraylib::IValue::VK_INVALID_DF &&
emission_value->get_kind() == mi::neuraylib::IValue::VK_INVALID_DF)
{
need_backface_bsdf = false;
need_backface_edf = false;
need_backface_emission_intensity = false;
}
}
}
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[1].function_index);
mat.edf.x = static_cast<unsigned int>(target_code_index);
mat.edf.y = static_cast<unsigned int>(descs[2].function_index);
mat.emission_intensity.x = static_cast<unsigned int>(target_code_index);
mat.emission_intensity.y = static_cast<unsigned int>(descs[3].function_index);
mat.backface_bsdf.x = static_cast<unsigned int>(target_code_index);
mat.backface_bsdf.y = static_cast<unsigned int>(
need_backface_bsdf ? descs[8].function_index : descs[1].function_index);
mat.backface_edf.x = static_cast<unsigned int>(target_code_index);
mat.backface_edf.y = static_cast<unsigned int>(
need_backface_edf ? descs[9].function_index : descs[2].function_index);
mat.backface_emission_intensity.x = static_cast<unsigned int>(target_code_index);
mat.backface_emission_intensity.y = static_cast<unsigned int>(
need_backface_emission_intensity ? descs[10].function_index : descs[3].function_index);
mat.volume_absorption.x = static_cast<unsigned int>(target_code_index);
mat.volume_absorption.y = static_cast<unsigned int>(descs[4].function_index);
mat.thin_walled.x = static_cast<unsigned int>(target_code_index);
mat.thin_walled.y = static_cast<unsigned int>(descs[5].function_index);
mat.cutout_opacity.x = static_cast<unsigned int>(target_code_index);
mat.cutout_opacity.y = static_cast<unsigned int>(descs[6].function_index);
}
else
{
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[7].function_index);
mat.contains_hair_bsdf = 1;
}
// init tag maps with zeros (optional)
memset(mat.bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.backface_bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.backface_edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
return mat;
}
void create_cuda_material_handles(
Df_cuda_material& mat,
const mi::neuraylib::ITarget_code* target_code,
LPE_state_machine& lpe_state_machine)
{
// fill tag ID list.
// allows to map from local per material Tag IDs to global per scene Tag IDs
// Note, calling 'LPE_state_machine::handle_to_global_tag(...)' registers the string handles
// present in the MDL in our 'scene'
mat.bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
target_code->get_callable_function_df_handle_count(mat.bsdf.y));
for (mi::Size i = 0; i < mat.bsdf_mtag_to_gtag_map_size; ++i)
mat.bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
target_code->get_callable_function_df_handle(mat.bsdf.y, i));
// same for all other distribution functions
mat.edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
target_code->get_callable_function_df_handle_count(mat.edf.y));
for (mi::Size i = 0; i < mat.edf_mtag_to_gtag_map_size; ++i)
mat.edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
target_code->get_callable_function_df_handle(mat.edf.y, i));
mat.backface_bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
target_code->get_callable_function_df_handle_count(mat.backface_bsdf.y));
for (mi::Size i = 0; i < mat.backface_bsdf_mtag_to_gtag_map_size; ++i)
mat.backface_bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
target_code->get_callable_function_df_handle(mat.backface_bsdf.y, i));
mat.backface_edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
target_code->get_callable_function_df_handle_count(mat.backface_edf.y));
for (mi::Size i = 0; i < mat.backface_edf_mtag_to_gtag_map_size; ++i)
mat.backface_edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
target_code->get_callable_function_df_handle(mat.backface_edf.y, i));
}
// checks if a compiled material contains none-invalid hair BSDF
bool contains_hair_bsdf(const mi::neuraylib::ICompiled_material* compiled_material)
{
compiled_material->get_body());
for (mi::Size i = 0, n = body_args->get_size(); i < n; ++i)
{
const char* name = body_args->get_name(i);
if (strcmp(name, "hair") == 0)
{
body_args->get_expression(i));
if (hair_exp->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
return true;
hair_exp->get_interface<const mi::neuraylib::IExpression_constant>());
hair_exp_const->get_value());
return hair_exp_const_value->get_kind() != mi::neuraylib::IValue::VK_INVALID_DF;
}
}
return true;
}
static void usage(const char *name)
{
std::cout
<< "usage: " << name << " [options] [<material_name1|full_mdle_path1> ...]\n"
<< "-h|--help print this text and exit\n"
<< "-v|--version print the MDL SDK version string and exit\n"
<< "--device <id> run on CUDA device <id> (default: 0)\n"
<< "--nogl don't open interactive display\n"
<< "--nocc don't use class-compilation\n"
<< "--noaux don't generate code for albedo and normal buffers\n"
<< "--nopdf don't generate code for pdf\n"
<< " (for simplicity implies --noaux)\n"
<< "--an use adapt normal function\n"
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
<< "--interpreter use the DF interpreter (if possible)\n"
#endif
<< "--gui_scale <factor> GUI scaling factor (default: 1.0)\n"
<< "--res <res_x> <res_y> resolution (default: 1024x1024)\n"
<< "--hdr <filename> HDR environment map "
"(default: nvidia/sdk_examples/resources/environment.hdr)\n"
<< "--hdr_rot <degrees> rotation of the environment in degrees (default: 0.0)\n"
<< "-o <outputfile> image file to write result to (default: output.exr).\n"
<< " With multiple materials \"-<material index>\" will be\n"
<< " added in front of the extension\n"
<< "--spp <num> samples per pixel, only active for --nogl (default: 4096)\n"
<< "--spi <num> samples per render call (default: 8)\n"
<< "-t <type> 0: eval, 1: sample, 2: mis, 3: mis + pdf, 4: no env\n"
<< " (default: 2)\n"
<< "-e <exposure> exposure for interactive display (default: 0.0)\n"
<< "-f <fov> the camera field of view in degree (default: 96.0)\n"
<< "--cam <x> <y> <z> set the camera position (default 0 0 3).\n"
<< " The camera will always look towards (0, 0, 0).\n"
<< "-l <x> <y> <z> <r> <g> <b> add an isotropic point light with given coordinates and "
"intensity (flux)\n"
<< "-p|--mdl_path <path> MDL search path, can occur multiple times.\n"
<< "--max_path_length <num> maximum path length, default 4 (up to one total internal\n"
<< " reflection), clamped to 2..100\n"
<< "--noaa disable pixel oversampling\n"
<< "-d enable use of derivatives\n"
<< "--fold_ternary_on_df fold all ternary operators on *df types (default: false)\n"
<< "--allowed_scatter_mode <m> limits the allowed scatter mode to \"none\", \"reflect\", "
<< "\"transmit\" or \"reflect_and_transmit\" (default: restriction disabled)\n"
<< "\n"
<< "Note: material names can end with an '*' as a wildcard\n"
<< " and alternatively, full MDLE file paths can be passed as material name\n";
exit(EXIT_FAILURE);
}
int MAIN_UTF8(int argc, char* argv[])
{
// Parse commandline options
Options options;
mi::examples::mdl::Configure_options configure_options;
bool print_version_and_exit = false;
for (int i = 1; i < argc; ++i) {
const char *opt = argv[i];
if (opt[0] == '-') {
if (strcmp(opt, "--device") == 0 && i < argc - 1) {
options.cuda_device = atoi(argv[++i]);
} else if (strcmp(opt, "--nogl") == 0) {
options.opengl = false;
} else if (strcmp(opt, "--nocc") == 0) {
options.use_class_compilation = false;
} else if (strcmp(opt, "--noaux") == 0) {
options.enable_auxiliary_output = false;
} else if (strcmp(opt, "--nopdf") == 0) {
options.enable_pdf = false;
// also disable aux to avoid function indices depending on command line options
options.enable_auxiliary_output = false;
} else if (strcmp(opt, "--an") == 0) {
options.use_adapt_normal = false;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
} else if (strcmp(opt, "--interpreter") == 0) {
options.use_df_interpreter = true;
#endif
} else if (strcmp(opt, "--gui_scale") == 0 && i < argc - 1) {
options.gui_scale = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "--res") == 0 && i < argc - 2) {
options.res_x = std::max(atoi(argv[++i]), 1);
options.res_y = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--hdr") == 0 && i < argc - 1) {
options.hdrfile = argv[++i];
} else if (strcmp(opt, "--hdr_rot") == 0 && i < argc - 1) {
options.hdr_rot = static_cast<float>(atof(argv[++i]));
// wrap in case of negative input
// we don't want fmodf behavior for negative values
options.hdr_rot -= floorf(options.hdr_rot / 360.0f) * 360.f;
options.hdr_rot = fmodf(options.hdr_rot, 360.0f);
} else if (strcmp(opt, "-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
} else if (strcmp(opt, "--spp") == 0 && i < argc - 1) {
options.iterations = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--spi") == 0 && i < argc - 1) {
options.samples_per_iteration = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "-t") == 0 && i < argc - 1) {
const int type = atoi(argv[++i]);
if (type < 0 || type >= MDL_TEST_COUNT) {
std::cout << "Invalid type for \"-t\" option!" << std::endl;
usage(argv[0]);
}
options.mdl_test_type = type;
} else if (strcmp(opt, "-e") == 0 && i < argc - 1) {
options.exposure = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-f") == 0 && i < argc - 1) {
options.fov = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "--cam") == 0 && i < argc - 3) {
options.cam_pos.x = static_cast<float>(atof(argv[++i]));
options.cam_pos.y = static_cast<float>(atof(argv[++i]));
options.cam_pos.z = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-l") == 0 && i < argc - 6) {
options.light_pos.x = static_cast<float>(atof(argv[++i]));
options.light_pos.y = static_cast<float>(atof(argv[++i]));
options.light_pos.z = static_cast<float>(atof(argv[++i]));
options.light_intensity.x = static_cast<float>(atof(argv[++i]));
options.light_intensity.y = static_cast<float>(atof(argv[++i]));
options.light_intensity.z = static_cast<float>(atof(argv[++i]));
} else if ((strcmp(opt, "-p") == 0 || strcmp(opt, "--mdl_path") == 0) && i < argc - 1) {
configure_options.additional_mdl_paths.push_back(argv[++i]);
} else if (strcmp(opt, "--max_path_length") == 0 && i < argc - 1) {
options.max_path_length = std::min(std::max(atoi(argv[++i]), 2), 100);
} else if (strcmp(opt, "--noaa") == 0) {
options.no_aa = true;
} else if (strcmp(opt, "-d") == 0) {
options.enable_derivatives = true;
} else if (strcmp(opt, "--fold_ternary_on_df") == 0) {
options.fold_ternary_on_df = true;
} else if (strcmp(opt, "--allowed_scatter_mode") == 0 && i < argc - 1) {
options.enable_bsdf_flags = true;
char const *mode = argv[++i];
if (strcmp(mode, "none") == 0) {
options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_NONE;
} else if (strcmp(mode, "reflect") == 0) {
options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_ALLOW_REFLECT;
} else if (strcmp(mode, "transmit") == 0) {
options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_ALLOW_TRANSMIT;
} else if (strcmp(mode, "reflect_and_transmit") == 0) {
options.allowed_scatter_mode =
mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT;
} else {
std::cout << "Unknown allowed_scatter_mode: \"" << mode << "\"" << std::endl;
usage(argv[0]);
}
} else if (strcmp(opt, "-v") == 0 || strcmp(opt, "--version") == 0) {
print_version_and_exit = true;
} else {
if (strcmp(opt, "-h") != 0 && strcmp(opt, "--help") != 0)
std::cout << "Unknown option: \"" << opt << "\"" << std::endl;
usage(argv[0]);
}
}
else
options.material_names.push_back(std::string(opt));
}
if (options.mdl_test_type == 3 && !options.enable_pdf)
exit_failure("Cannot use \"mis + pdf\" test type when pdf is disabled.");
// Use default material, if none was provided via command line
if (options.material_names.empty())
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_df");
// Access the MDL SDK
mi::base::Handle<mi::neuraylib::INeuray> neuray(mi::examples::mdl::load_and_get_ineuray());
if (!neuray.is_valid_interface())
exit_failure("Failed to load the SDK.");
// Handle the --version flag
if (print_version_and_exit) {
// print library version information.
neuray->get_api_component<const mi::neuraylib::IVersion>());
fprintf(stdout, "%s\n", version->get_string());
// free the handles and unload the MDL SDK
version = nullptr;
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
// Configure the MDL SDK
if (!mi::examples::mdl::configure(neuray.get(), configure_options))
exit_failure("Failed to initialize the SDK.");
// Start the MDL SDK
mi::Sint32 ret = neuray->start();
if (ret != 0)
exit_failure("Failed to initialize the SDK. Result code: %d", ret);
// LPE state machine for rendering into multiple buffers
LPE_state_machine lpe_state_machine;
lpe_state_machine.handle_to_global_tag("point_light"); // register handles before building
lpe_state_machine.handle_to_global_tag("env"); // the state machine
// register other handles in the scene, e.g.: for object instances
lpe_state_machine.handle_to_global_tag("sphere"); // for illustration, not used currently
// Add some common and custom LPEs
lpe_state_machine.add_expression("Beauty", LPE::create_common(LPE::Common::Beauty));
lpe_state_machine.add_expression("Diffuse", LPE::create_common(LPE::Common::Diffuse));
lpe_state_machine.add_expression("Glossy", LPE::create_common(LPE::Common::Glossy));
lpe_state_machine.add_expression("Specular", LPE::create_common(LPE::Common::Specular));
lpe_state_machine.add_expression("SSS", LPE::create_common(LPE::Common::SSS));
lpe_state_machine.add_expression("Transmission", LPE::create_common(LPE::Common::Transmission));
lpe_state_machine.add_expression("Beauty-Env", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::light("env") })); // only light with the name 'env'
lpe_state_machine.add_expression("Beauty-PointLight", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::light("point_light") })); // only light with the name 'point_light'
lpe_state_machine.add_expression("Beauty-Emission", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::emission() })); // only emission
lpe_state_machine.add_expression("Beauty-Base", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("base")),
LPE::light()})); // no emission
lpe_state_machine.add_expression("Beauty-Coat", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("coat")),
LPE::light()})); // no emission
lpe_state_machine.add_expression("Beauty-^Coat", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("coat", false)),
LPE::any_light()})); // emission or light source
{
// Create a transaction
neuray->get_api_component<mi::neuraylib::IDatabase>());
mi::base::Handle<mi::neuraylib::IScope> scope(database->get_global_scope());
mi::base::Handle<mi::neuraylib::ITransaction> transaction(scope->create_transaction());
// Access needed API components
neuray->get_api_component<mi::neuraylib::IMdl_factory>());
neuray->get_api_component<mi::neuraylib::IMdl_impexp_api>());
neuray->get_api_component<mi::neuraylib::IMdl_backend_api>());
{
// Initialize the material compiler with 16 result buffer slots ("texture results")
Material_compiler mc(
mdl_impexp_api.get(),
mdl_backend_api.get(),
mdl_factory.get(),
transaction.get(),
16,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
options.use_df_interpreter,
#endif
options.enable_derivatives,
options.fold_ternary_on_df,
options.enable_auxiliary_output,
options.enable_pdf,
options.use_adapt_normal,
options.enable_bsdf_flags,
/*df_handle_mode=*/ "pointer",
/*lambda_return_mode=*/ "value");
// List of materials in the scene
std::vector<Df_cuda_material> material_bundle;
// Select the functions to translate
std::vector<mi::neuraylib::Target_function_description> descs;
descs.push_back(
descs.push_back(
descs.push_back(
mi::neuraylib::Target_function_description("surface.emission.emission"));
descs.push_back(
mi::neuraylib::Target_function_description("surface.emission.intensity"));
descs.push_back(
mi::neuraylib::Target_function_description("volume.absorption_coefficient"));
descs.push_back(
descs.push_back(
mi::neuraylib::Target_function_description("geometry.cutout_opacity"));
descs.push_back(
descs.push_back(
descs.push_back(
mi::neuraylib::Target_function_description("backface.emission.emission"));
descs.push_back(
mi::neuraylib::Target_function_description("backface.emission.intensity"));
Timing timing_compile("Compile MDL to PTX");
// Generate code for all materials
std::vector<std::string> used_material_names;
for (size_t i = 0; i < options.material_names.size(); ++i) {
std::string& opt_material_name = options.material_names[i];
// split module and material name
std::string module_qualified_name, material_simple_name;
if (!mi::examples::mdl::parse_cmd_argument_material_name(
opt_material_name, module_qualified_name, material_simple_name, true))
exit_failure("Provided material name '%s' is invalid.",
opt_material_name.c_str());
// Is this a material name pattern? (not applicable to mdle)
if (!mi::examples::strings::ends_with(module_qualified_name, ".mdle") &&
opt_material_name.size() > 1 &&
opt_material_name.back() == '*') {
// prepare the pattern for matching
std::string pattern = opt_material_name.substr(0, opt_material_name.size() - 1);
if (!starts_with(pattern, "::"))
pattern = "::" + pattern;
// load the module
std::string module_db_name = mc.load_module(module_qualified_name);
// iterate over all materials in that module
transaction->access<const mi::neuraylib::IModule>(module_db_name.c_str()));
for (mi::Size j = 0, n = loaded_module->get_material_count(); j < n; ++j) {
// get the j`th material
const char* material_db_name = loaded_module->get_material(j);
material_db_name));
// make sure the material name starts with the pattern
std::string material_qualified_name = mat_def->get_mdl_name();
if (!mi::examples::strings::starts_with(material_qualified_name, pattern))
continue;
std::cout << "Adding material \"" << material_qualified_name << "\"" << std::endl;
// Add functions of the material to the link unit
check_success(mc.add_material(
module_qualified_name, material_db_name,
descs.data(), descs.size(),
options.use_class_compilation));
mc.get_compiled_materials().back());
// Create application material representation
material_bundle.push_back(create_cuda_material(
compiled_material.get(),
0, material_bundle.size(), descs,
contains_hair_bsdf(compiled_material.get())));
used_material_names.push_back(material_qualified_name);
}
} else {
// Load the module
std::string module_db_name = mc.load_module(module_qualified_name);
transaction->access<mi::neuraylib::IModule>(module_db_name.c_str()));
if (!module)
exit_failure("Failed to access the loaded module.");
// Construct the material name
std::string material_db_name
= module_db_name + "::" + material_simple_name;
material_db_name = mi::examples::mdl::add_missing_material_signature(
module.get(), material_db_name);
if (material_db_name.empty())
exit_failure("Failed to find the material %s in the module %s.",
material_simple_name.c_str(), module_qualified_name.c_str());
std::cout << "Adding material \"" << material_db_name << "\"" << std::endl;
// Add functions of the material to the link unit
check_success(mc.add_material(
module_qualified_name, material_db_name,
descs.data(), descs.size(),
options.use_class_compilation));
mc.get_compiled_materials().back());
// Create application material representation
material_bundle.push_back(create_cuda_material(
compiled_material.get(),
0, material_bundle.size(), descs,
contains_hair_bsdf(compiled_material.get())));
used_material_names.push_back(material_db_name);
}
}
// Update the material names with the actually used names
options.material_names = used_material_names;
// Generate the CUDA PTX code for the link unit
mc.generate_cuda_ptx());
timing_compile.stop();
// convert handles to tag IDs
for (auto& mat : material_bundle)
create_cuda_material_handles(mat, target_code.get(), lpe_state_machine);
neuray->get_api_component<mi::neuraylib::IFactory>());
// Acquire image API needed to prepare the textures
neuray->get_api_component<mi::neuraylib::IImage_api>());
// when all scene elements that have handles are loaded and all handles as well as
// light path expressions are registered, the state machine can be constructed.
lpe_state_machine.build();
// Render
render_scene(
options,
transaction,
factory,
image_api,
mdl_impexp_api,
target_code,
mc.get_material_defs(),
mc.get_compiled_materials(),
mc.get_argument_block_indices(),
material_bundle,
lpe_state_machine);
}
transaction->commit();
}
// Shut down the MDL SDK
if (neuray->shutdown() != 0)
exit_failure("Failed to shutdown the SDK.");
// Unload the MDL SDK
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
// Convert command line arguments to UTF8 on Windows
COMMANDLINE_TO_UTF8
This interface represents bool.
Definition: inumber.h:122
This interface represents maps, i.e., a key-value based data structure.
Definition: imap.h:41
A wrapper around the interfaces for MDL annotations.
Definition: annotation_wrapper.h:37
A scene element that stores measured BSDF data.
Definition: ibsdf_measurement.h:39
This interface represents a compiled material.
Definition: icompiled_material.h:97
virtual const IExpression * lookup_sub_expression(const char *path) const =0
Looks up a sub-expression of the compiled material.
virtual const IValue * get_argument(Size index) const =0
Returns the value of an argument.
virtual Size get_parameter_count() const =0
Returns the number of parameters used by this compiled material.
virtual const IExpression_direct_call * get_body() const =0
Returns the body (or material root) of the compiled material.
virtual base::Uuid get_slot_hash(Material_slot slot) const =0
Returns the hash of a particular material slot.
virtual const char * get_parameter_name(Size index) const =0
Returns the name of a parameter.
This interface is used to interact with the distributed database.
Definition: idatabase.h:289
A constant expression.
Definition: iexpression.h:96
@ EK_CONSTANT
A constant expression. See mi::neuraylib::IExpression_constant.
Definition: iexpression.h:55
This API component allows the creation, assignment, and cloning of instances of types.
Definition: ifactory.h:35
This interface represents a function definition.
Definition: ifunction_definition.h:44
virtual const IAnnotation_list * get_parameter_annotations() const =0
Returns the annotations of all parameters.
virtual const char * get_mdl_name() const =0
Returns the MDL name of the function definition.
This interface provides various utilities related to canvases and buffers.
Definition: iimage_api.h:72
This interface represents a pixel image file.
Definition: iimage.h:66
This interface represents light profiles.
Definition: ilightprofile.h:73
This interface can be used to obtain the MDL backends.
Definition: imdl_backend_api.h:56
Factory for various MDL interfaces and functions.
Definition: imdl_factory.h:53
API component for MDL related import and export operations.
Definition: imdl_impexp_api.h:43
This interface represents an MDL module.
Definition: imodule.h:634
Represents target code of an MDL backend.
Definition: imdl_backend.h:783
Textures add image processing options to images.
Definition: itexture.h:68
virtual const base::IInterface * access(const char *name)=0
Retrieves an element from the database.
virtual base::IInterface * create(const char *type_name, Uint32 argc=0, const base::IInterface *argv[]=0)=0
Creates an object of the type type_name.
virtual Sint32 commit()=0
Commits the transaction.
The type of kind vector.
Definition: itype.h:399
@ TK_BOOL
The boolean type. See mi::neuraylib::IType_bool.
Definition: itype.h:158
@ TK_COLOR
The color type. See mi::neuraylib::IType_color.
Definition: itype.h:174
@ TK_VECTOR
A vector type. See mi::neuraylib::IType_vector.
Definition: itype.h:170
@ TK_INT
The integer type. See mi::neuraylib::IType_int.
Definition: itype.h:160
@ TK_FLOAT
The float type. See mi::neuraylib::IType_float.
Definition: itype.h:164
A value of type array.
Definition: ivalue.h:403
A value of type boolean.
Definition: ivalue.h:106
A value of type enum.
Definition: ivalue.h:144
A value of type vector.
Definition: ivalue.h:309
Kind
The possible kinds of values.
Definition: ivalue.h:36
@ VK_TEXTURE
A texture value. See mi::neuraylib::IValue_texture.
Definition: ivalue.h:62
@ VK_LIGHT_PROFILE
A light_profile value. See mi::neuraylib::IValue_light_profile.
Definition: ivalue.h:64
@ VK_ARRAY
An array value. See mi::neuraylib::IValue_array.
Definition: ivalue.h:56
@ VK_FLOAT
A float value. See mi::neuraylib::IValue_float.
Definition: ivalue.h:44
@ VK_BSDF_MEASUREMENT
A bsdf_measurement value. See mi::neuraylib::IValue_bsdf_measurement.
Definition: ivalue.h:66
@ VK_ENUM
An enum value. See mi::neuraylib::IValue_enum.
Definition: ivalue.h:42
@ VK_INT
An integer value. See mi::neuraylib::IValue_int.
Definition: ivalue.h:40
@ VK_VECTOR
A vector value. See mi::neuraylib::IValue_vector.
Definition: ivalue.h:50
@ VK_INVALID_DF
An invalid distribution function value. See mi::neuraylib::IValue_invalid_df.
Definition: ivalue.h:60
@ VK_BOOL
A boolean value. See mi::neuraylib::IValue_bool.
Definition: ivalue.h:38
@ VK_STRING
A string value. See mi::neuraylib::IValue_string.
Definition: ivalue.h:48
@ VK_COLOR
A color value. See mi::neuraylib::IValue_color.
Definition: ivalue.h:54
Abstract interface for accessing version information.
Definition: iversion.h:19
Interface * get() const
Access to the interface. Returns 0 for an invalid interface.
Definition: handle.h:294
std::basic_ostream<C, T> & info(std::basic_ostream<C, T> &ostream)
Manipulator for mi::base::Log_stream.
Definition: ilogger.h:580
unsigned int Uint32
32-bit unsigned integer.
Definition: types.h:49
Uint64 Size
Unsigned integral type that is large enough to hold the size of all types.
Definition: types.h:112
signed int Sint32
32-bit signed integer.
Definition: types.h:46
Float32 length(Float32 a)
Returns the Euclidean norm of the scalar a (its absolute value).
Definition: function.h:1107
Bbox<T, DIM> operator/(const Bbox<T, DIM> &bbox, T divisor)
Returns a bounding box that is a version of bbox divided by divisor, i.e., bbox.max and bbox....
Definition: bbox.h:518
Color acos(const Color &c)
Returns a color with the elementwise arc cosine of the color c.
Definition: color.h:477
Color atan2(const Color &c, const Color &d)
Returns a color with the elementwise arc tangent of the color c / d.
Definition: color.h:509
Color sin(const Color &c)
Returns a color with the elementwise sine of the color c.
Definition: color.h:761
Color cos(const Color &c)
Returns a color with the elementwise cosine of the color c.
Definition: color.h:558
Color pow(const Color &a, const Color &b)
Returns the color a elementwise to the power of b.
Definition: color.h:719
virtual const char * get_callable_function_df_handle(Size func_index, Size handle_index) const =0
Get the name of a distribution function handle referenced by a callable function.
Df_flags
Flags controlling the calculation of DF results.
Definition: target_code_types.h:761
virtual Size get_callable_function_df_handle_count(Size func_index) const =0
Get the number of distribution function handles referenced by a callable function.
@ DF_FLAGS_NONE
allows nothing -> black
Definition: target_code_types.h:762
@ SLOT_BACKFACE_EMISSION_INTENSITY
Slot "backface.emission.intensity".
Definition: icompiled_material.h:34
@ SLOT_BACKFACE_SCATTERING
Slot "backface.scattering".
Definition: icompiled_material.h:32
@ SLOT_SURFACE_EMISSION_INTENSITY
Slot "surface.emission.intensity".
Definition: icompiled_material.h:30
@ SLOT_SURFACE_SCATTERING
Slot "surface.scattering".
Definition: icompiled_material.h:28
@ SLOT_BACKFACE_EMISSION_EDF_EMISSION
Slot "backface.emission.emission".
Definition: icompiled_material.h:33
@ SLOT_SURFACE_EMISSION_EDF_EMISSION
Slot "surface.emission.emission".
Definition: icompiled_material.h:29
Math API.
Common namespace for APIs of NVIDIA Advanced Rendering Center GmbH.
Definition: example_derivatives.dox:5
Description of target function.
Definition: imdl_backend.h:1764
Structure representing the state during traversal of the nested layout.
Definition: imdl_backend.h:693

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.h

/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
#ifndef EXAMPLE_DF_CUDA_H
#define EXAMPLE_DF_CUDA_H
#include <cstdint>
#include <vector_types.h>
#include <texture_types.h>
struct Target_code_data;
enum Mdl_test_type {
MDL_TEST_EVAL = 0, // only use BSDF evaluation
MDL_TEST_SAMPLE = 1, // only use BSDF sampling
MDL_TEST_MIS = 2, // multiple importance sampling
MDL_TEST_MIS_PDF = 3, // multiple importance sampling, but use BSDF explicit pdf computation
MDL_TEST_NO_ENV = 4, // no environment sampling
MDL_TEST_COUNT
};
const unsigned MAX_DF_HANDLES = 8;
struct Env_accel {
unsigned int alias;
float q;
float pdf;
};
namespace
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
inline uint2 make_invalid()
{
uint2 index_pair;
index_pair.x = ~0;
index_pair.y = ~0;
return index_pair;
}
}
struct Df_cuda_material
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
Df_cuda_material()
: compiled_material_index(0)
, argument_block_index(~0)
, init(make_invalid())
, bsdf(make_invalid())
, edf(make_invalid())
, emission_intensity(make_invalid())
, backface_bsdf(make_invalid())
, backface_edf(make_invalid())
, backface_emission_intensity(make_invalid())
, volume_absorption(make_invalid())
, thin_walled(make_invalid())
, cutout_opacity(make_invalid())
, contains_hair_bsdf(0)
{
}
// used on host side only
unsigned int compiled_material_index;
// the argument block index of this material (~0 if not used)
unsigned int argument_block_index;
// pair of target_code_index and function_index to identify the init function
uint2 init;
// pair of target_code_index and function_index to identify the bsdf
uint2 bsdf;
// pair of target_code_index and function_index to identify the edf
uint2 edf;
// pair of target_code_index and function_index for intensity
uint2 emission_intensity;
// pair of target_code_index and function_index to identify the bsdf
uint2 backface_bsdf;
// pair of target_code_index and function_index to identify the edf
uint2 backface_edf;
// pair of target_code_index and function_index for intensity
uint2 backface_emission_intensity;
// pair of target_code_index and function_index for volume absorption
uint2 volume_absorption;
// pair of target_code_index and function_index for thin_walled
uint2 thin_walled;
// pair of target_code_index and function_index for cutout_opacity
uint2 cutout_opacity;
// maps 'material tags' to 'global tags' for the surface scattering distribution function
unsigned int bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int bsdf_mtag_to_gtag_map_size;
// maps 'material tags' to 'global tags' for the emission distribution function
unsigned int edf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int edf_mtag_to_gtag_map_size;
// maps 'material tags' to 'global tags' for the backface scattering distribution function
unsigned int backface_bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int backface_bsdf_mtag_to_gtag_map_size;
// maps 'material tags' to 'global tags' for the backface emission distribution function
unsigned int backface_edf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int backface_edf_mtag_to_gtag_map_size;
unsigned int contains_hair_bsdf;
};
enum Geometry_type
{
GT_SPHERE = 0, // Intersect a sphere with unit radius located at the (0,0,0)
GT_HAIR = 1, // Intersect an infinite cylinder at (0,0,0) oriented in y-direction
};
struct Kernel_params {
// display
uint2 resolution;
float exposure_scale;
unsigned int *display_buffer;
float3 *accum_buffer;
float3 *albedo_buffer;
float3 *normal_buffer;
bool enable_auxiliary_output;
unsigned display_buffer_index;
// parameters
unsigned int iteration_start;
unsigned int iteration_num;
unsigned int mdl_test_type;
unsigned int max_path_length;
unsigned int use_derivatives;
unsigned int disable_aa;
mi::neuraylib::Df_flags bsdf_data_flags;
// camera
float3 cam_pos;
float3 cam_dir;
float3 cam_right;
float3 cam_up;
float cam_focal;
// geometry
unsigned int geometry;
// environment
uint2 env_size;
cudaTextureObject_t env_tex;
Env_accel *env_accel;
float env_intensity; // scaling factor
uint32_t env_gtag; // global light group tag for handle 'env'
float env_rotation; // rotation of the environment
// point light
float3 light_pos;
float3 light_color;
float light_intensity;
uint32_t point_light_gtag; // global light group tag for handle 'point_light'
// material data
Target_code_data *tc_data;
char const **arg_block_list;
unsigned int current_material;
Df_cuda_material *material_buffer;
// LPE state machine
uint32_t lpe_num_states; // number of states in the state machine
uint32_t lpe_num_transitions; // number of possible transitions between 2 states
uint32_t *lpe_state_table; // actual machine; size: #states x #transitions
uint32_t *lpe_final_mask; // encodes final states; size: #states
uint32_t default_gtag; // tag ID for the empty string
uint32_t lpe_ouput_expression; // the LPE evaluated for output
// only one here, but additional one analogously
};
enum Display_buffer_options
{
DISPLAY_BUFFER_LPE = 0,
DISPLAY_BUFFER_ALBEDO,
DISPLAY_BUFFER_NORMAL,
DISPLAY_BUFFER_COUNT
};
#endif // EXAMPLE_DF_CUDA_H
Types required for execution of generated native and CUDA code.

Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h

/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/shared/texture_support_cuda.h
//
// This file contains the implementations and the vtables of the texture access functions.
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#include <math.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
typedef mi::neuraylib::Texture_handler_base Texture_handler_base;
// Custom structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
explicit Texture()
: filtered_object(0)
, unfiltered_object(0)
, size(make_uint3(0, 0, 0))
, inv_size(make_float3(0.0f, 0.0f, 0.0f))
{}
explicit Texture(
cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object; // uses filter mode cudaFilterModeLinear
cudaTextureObject_t unfiltered_object; // uses filter mode cudaFilterModePoint
uint3 size; // size of the texture, needed for texel access
float3 inv_size; // the inverse values of the size of the texture
};
// Custom structure representing an MDL BSDF measurement.
struct Mbsdf
{
unsigned has_data[2]; // true if there is a measurement for this part
cudaTextureObject_t eval_data[2]; // uses filter mode cudaFilterModeLinear
float max_albedo[2]; // max albedo used to limit the multiplier
float* sample_data[2]; // CDFs for sampling a BSDF measurement
float* albedo_data[2]; // max albedo for each theta (isotropic)
uint2 angular_resolution[2]; // size of the dataset, needed for texel access
float2 inv_angular_resolution[2]; // the inverse values of the size of the dataset
unsigned num_channels[2]; // number of color channels (1 or 3)
};
// Structure representing a Light Profile
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, inv_angular_resolution(make_float2(0.0f, 0.0f))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution; // angular resolution of the grid
float2 inv_angular_resolution; // inverse angular resolution of the grid
float2 theta_phi_start; // start of the grid
float2 theta_phi_delta; // angular step size
float2 theta_phi_inv_delta; // inverse step size
float candela_multiplier; // factor to rescale the normalized data
float total_power;
cudaTextureObject_t eval_data; // normalized data sampled on grid
float* cdf_data; // CDFs for sampling a light profile
};
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler : Texture_handler_base {
// additional data for the texture access functions can be provided here
size_t num_textures; // the number of textures used by the material
// (without the invalid texture)
Texture const *textures; // the textures used by the material
// (without the invalid texture)
size_t num_mbsdfs; // the number of mbsdfs used by the material
// (without the invalid mbsdf)
Mbsdf const *mbsdfs; // the mbsdfs used by the material
// (without the invalid mbsdf)
size_t num_lightprofiles; // number of elements in the lightprofiles field
// (without the invalid light profile)
Lightprofile const *lightprofiles; // a device pointer to a list of mbsdfs objects, if used
// (without the invalid light profile)
};
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler_deriv : mi::neuraylib::Texture_handler_deriv_base {
// additional data for the texture access functions can be provided here
size_t num_textures; // the number of textures used by the material
// (without the invalid texture)
Texture const *textures; // the textures used by the material
// (without the invalid texture)
size_t num_mbsdfs; // the number of mbsdfs used by the material
// (without the invalid texture)
Mbsdf const *mbsdfs; // the mbsdfs used by the material
// (without the invalid texture)
size_t num_lightprofiles; // number of elements in the lightprofiles field
// (without the invalid light profile)
Lightprofile const *lightprofiles; // a device pointer to a list of mbsdfs objects, if used
// (without the invalid light profile)
};
#if defined(__CUDACC__)
// Stores a float4 in a float[4] array.
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
// Stores a float in all elements of a float[4] array.
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
// Stores the given float values in a float[4] array.
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
// Stores a float3 in a float[3] array.
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
// Stores a float4 in a float[3] array, ignoring v.w.
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
// Stores a float in all elements of a float[3] array.
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
// Stores the given float values in a float[3] array.
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
// ------------------------------------------------------------------------------------------------
// Textures
// ------------------------------------------------------------------------------------------------
// Applies wrapping and cropping to the given coordinate.
// Note: This macro returns if wrap mode is clip and the coordinate is out of range.
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
/* Do nothing, use texture sampler default behavior */ \
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
// Modify texture coordinates to get better texture filtering,
// see http://www.iquilezles.org/www/articles/texture/texture.htm
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
// Implementation of tex::texel_float4() for a texture_2d texture.
// Note: uvtile and/or animated textures are not supported
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const /*uv_tile*/[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
// Implementation of tex::lookup_float4() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
// Implementation of tex::lookup_float3() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
// Implementation of tex::texel_float4() for a texture_3d texture.
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
// Implementation of tex::lookup_float4() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
// Implementation of tex::lookup_float3() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
// Implementation of resolution_2d function needed by generated code.
// Note: uvtile and/or animated textures are not supported
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const /*uv_tile*/[2],
float /*frame*/)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
// Implementation of resolution_3d function needed by generated code.
extern "C" __device__ void tex_resolution_3d(
int result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float /*frame*/)
{
Texture_handler const* self = static_cast<Texture_handler const*>(self_base);
if (texture_idx == 0 || texture_idx - 1 >= self->num_textures) {
// invalid texture returns zero
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Texture const& tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
result[2] = tex.size.z;
}
// Implementation of texture_isvalid().
extern "C" __device__ bool tex_texture_isvalid(
Texture_handler_base const *self_base,
unsigned texture_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return texture_idx != 0 && texture_idx - 1 < self->num_textures;
}
// Implementation of frame function needed by generated code.
// Note: frames are not implemented for the MDL SDK examples.
extern "C" __device__ void tex_frame(
int result[2],
Texture_handler_base const */*self_base*/,
unsigned /*texture_idx*/)
{
result[0] = 0;
result[1] = 0;
}
// ------------------------------------------------------------------------------------------------
// Light Profiles
// ------------------------------------------------------------------------------------------------
// Implementation of light_profile_power() for a light profile.
extern "C" __device__ float df_light_profile_power(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.total_power;
}
// Implementation of light_profile_maximum() for a light profile.
extern "C" __device__ float df_light_profile_maximum(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.candela_multiplier;
}
// Implementation of light_profile_isvalid() for a light profile.
extern "C" __device__ bool df_light_profile_isvalid(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return light_profile_idx != 0 && light_profile_idx - 1 < self->num_lightprofiles;
}
// binary search through CDF
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
// Implementation of df::light_profile_evaluate() for a light profile.
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
// map theta to 0..1 range
float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x * lp.inv_angular_resolution.x;
// converting input phi from -pi..pi to 0..2pi
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
// floorf wraps phi range into 0..2pi
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
// (phi < 0.0f) is no problem, this is handled by the (black) border
// since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
float v = phi * lp.theta_phi_inv_delta.y * lp.inv_angular_resolution.y;
// half pixel offset
// see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#linear-filtering
u += 0.5f * lp.inv_angular_resolution.x;
v += 0.5f * lp.inv_angular_resolution.y;
// wrap_mode: border black would be an alternative (but it produces artifacts at low res)
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
// Implementation of df::light_profile_sample() for a light profile.
extern "C" __device__ void df_light_profile_sample(
float result[3], // output: theta, phi, pdf
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const xi[3]) // uniform random values
{
result[0] = -1.0f; // negative theta means no emission
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
uint2 res = lp.angular_resolution;
if (res.x <= 2 || res.y <= 2)
return; // invalid resolution
// sample theta_out
//-------------------------------------------
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data; // CDF theta
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0); // binary search
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta; // rescale for re-usage
// sample phi_out
//-------------------------------------------
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1) // CDF theta block
+ (idx_theta * (res.y - 1)); // selected CDF for phi
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1); // binary search
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi; // rescale for re-usage
// compute theta and phi
//-------------------------------------------
// sample uniformly within the patch (grid cell)
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
// n = \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
// = 1 / (\cos{\theta_0} - \cos{\theta_1})
//
// \xi = n * \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
// => \cos{\theta} = (1 - \xi) \cos{\theta_0} + \xi \cos{\theta_1}
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
// align phi
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI); // wrap
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1]; // to [-pi, pi]
// compute pdf
//-------------------------------------------
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
// Implementation of df::light_profile_pdf() for a light profile.
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
// map theta to 0..1 range
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
// converting input phi from -pi..pi to 0..2pi
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
// floorf wraps phi range into 0..2pi
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
// (phi < 0.0f) is no problem, this is handled by the (black) border
// since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
// wrap_mode: border black would be an alternative (but it produces artifacts at low res)
if (idx_theta < 0 || idx_theta > res.x - 2 || idx_phi < 0 || idx_phi > res.y - 2)
return 0.0f;
// get probability for theta
//-------------------------------------------
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
// get probability for phi
//-------------------------------------------
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1) // CDF theta block
+ (idx_theta * (res.y - 1)); // selected CDF for phi
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
// compute probability to select a position in the sphere patch
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
// ------------------------------------------------------------------------------------------------
// BSDF Measurements
// ------------------------------------------------------------------------------------------------
// Implementation of bsdf_measurement_isvalid() for an MBSDF.
extern "C" __device__ bool df_bsdf_measurement_isvalid(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return bsdf_measurement_idx != 0 && bsdf_measurement_idx - 1 < self->num_mbsdfs;
}
// Implementation of df::bsdf_measurement_resolution() function needed by generated code,
// which retrieves the angular and chromatic resolution of the given MBSDF.
// The returned triple consists of: number of equi-spaced steps of theta_i and theta_o,
// number of equi-spaced steps of phi, and number of color channels (1 or 3).
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
{
// invalid MBSDF returns zero
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
// check for the part
if (part_idx > 1 || bm.has_data[part_idx] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
// pass out the information
result[0] = bm.angular_resolution[part_idx].x;
result[1] = bm.angular_resolution[part_idx].y;
result[2] = bm.num_channels[part_idx];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
// assuming each phi is between -pi and pi
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= float(M_ONE_OVER_PI);
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
// 3D volume on the GPU (phi_delta x theta_out x theta_in)
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
// Implementation of df::bsdf_measurement_evaluate() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
{
// invalid MBSDF returns zero
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
// check for the parts
if (part_idx > 1 || bm.has_data[part_idx] == 0)
{
store_result3(result, 0.0f);
return;
}
// handle channels
if (bm.num_channels[part_idx] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
// Implementation of df::bsdf_measurement_sample() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3], // output: theta, phi, pdf
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_out[2],
float const xi[3], // uniform random values
Mbsdf_part part)
{
result[0] = -1.0f; // negative theta means absorption
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return; // invalid MBSDFs returns zero
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return; // check for the part
// CDF data
uint2 res = bm.angular_resolution[part_idx];
const float* sample_data = bm.sample_data[part_idx];
if (res.x < 1 || res.y < 1)
return; // invalid resolution
unsigned idx_theta_out = unsigned(theta_phi_out[0] * float(M_ONE_OVER_PI * 2.0f) * float(res.x));
idx_theta_out = min(idx_theta_out, res.x - 1);
// sample theta_in
//-------------------------------------------
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_out * res.x;
unsigned idx_theta_in = sample_cdf(cdf_theta, res.x, xi0); // binary search
float prob_theta = cdf_theta[idx_theta_in];
if (idx_theta_in > 0)
{
const float tmp = cdf_theta[idx_theta_in - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta; // rescale for re-usage
// sample phi
//-------------------------------------------
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) + // CDF theta block
(idx_theta_out * res.x + idx_theta_in) * res.y; // selected CDF phi
// select which half-circle to choose with probability 0.5
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi = sample_cdf(cdf_phi, res.y, xi1); // binary search
float prob_phi = cdf_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi; // rescale for re-usage
// compute direction
//-------------------------------------------
const float2 inv_res = bm.inv_angular_resolution[part_idx];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_in) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1]; // phi \in [0, 2pi]
// align phi
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1]; // to [-pi, pi]
// compute pdf
//-------------------------------------------
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
// Implementation of df::bsdf_measurement_pdf() for an MBSDF.
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return 0.0f; // invalid MBSDF returns zero
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
unsigned part_idx = static_cast<unsigned>(part);
// check for the part
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return 0.0f;
// CDF data and resolution
const float* sample_data = bm.sample_data[part_idx];
uint2 res = bm.angular_resolution[part_idx];
// compute indices in the CDF data
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out); // phi_delta, theta_out, theta_in
unsigned idx_theta_in = unsigned(uvw.z * float(res.x));
unsigned idx_theta_out = unsigned(uvw.y * float(res.x));
unsigned idx_phi = unsigned(uvw.x * float(res.y));
idx_theta_in = min(idx_theta_in, res.x - 1);
idx_theta_out = min(idx_theta_out, res.x - 1);
idx_phi = min(idx_phi, res.y - 1);
// get probability to select theta_in
const float* cdf_theta = sample_data + idx_theta_out * res.x;
float prob_theta = cdf_theta[idx_theta_in];
if (idx_theta_in > 0)
{
const float tmp = cdf_theta[idx_theta_in - 1];
prob_theta -= tmp;
}
// get probability to select phi
const float* cdf_phi = sample_data +
(res.x * res.x) + // CDF theta block
(idx_theta_out * res.x + idx_theta_in) * res.y; // selected CDF phi
float prob_phi = cdf_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_phi[idx_phi - 1];
prob_phi -= tmp;
}
// compute probability to select a position in the sphere patch
float2 inv_res = bm.inv_angular_resolution[part_idx];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_in) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2], // output: max (in case of color) albedo
// for the selected direction ([0]) and
// global ([1])
Texture_handler const *self,
unsigned bsdf_measurement_idx,
float const theta_phi[2],
Mbsdf_part part)
{
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
// check for the part
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return;
const uint2 res = bm.angular_resolution[part_idx];
if (res.x < 1)
return;
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta = min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_idx][idx_theta];
result[1] = bm.max_albedo[part_idx];
}
// Implementation of df::bsdf_measurement_albedos() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4], // output: [0] albedo refl. for theta_phi
// [1] max albedo refl. global
// [2] albedo trans. for theta_phi
// [3] max albedo trans. global
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return; // invalid MBSDF returns zero
df_bsdf_measurement_albedo(
&result[0],
self,
bsdf_measurement_idx,
theta_phi,
df_bsdf_measurement_albedo(
&result[2],
self,
bsdf_measurement_idx,
theta_phi,
}
// ------------------------------------------------------------------------------------------------
// Normal adaption (dummy functions)
//
// Can be enabled via backend option "use_renderer_adapt_normal".
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_DUMMY_ADAPTNORMAL
// Implementation of adapt_normal().
extern "C" __device__ void adapt_normal(
float result[3],
Texture_handler_base const *self_base,
float const normal[3])
{
// just return original normal
result[0] = normal[0];
result[1] = normal[1];
result[2] = normal[2];
}
#endif // TEX_SUPPORT_NO_DUMMY_ADAPTNORMAL
// ------------------------------------------------------------------------------------------------
// Scene data (dummy functions)
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_DUMMY_SCENEDATA
// Implementation of scene_data_isvalid().
extern "C" __device__ bool scene_data_isvalid(
Texture_handler_base const *self_base,
unsigned scene_data_id)
{
return false;
}
// Implementation of scene_data_lookup_float4().
extern "C" __device__ void scene_data_lookup_float4(
float result[4],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[4],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
// Implementation of scene_data_lookup_float3().
extern "C" __device__ void scene_data_lookup_float3(
float result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_color().
extern "C" __device__ void scene_data_lookup_color(
float result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_float2().
extern "C" __device__ void scene_data_lookup_float2(
float result[2],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[2],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
}
// Implementation of scene_data_lookup_float().
extern "C" __device__ float scene_data_lookup_float(
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value,
bool uniform_lookup)
{
// just return default value
return default_value;
}
// Implementation of scene_data_lookup_int4().
extern "C" __device__ void scene_data_lookup_int4(
int result[4],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[4],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
// Implementation of scene_data_lookup_int3().
extern "C" __device__ void scene_data_lookup_int3(
int result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_int2().
extern "C" __device__ void scene_data_lookup_int2(
int result[2],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[2],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
}
// Implementation of scene_data_lookup_int().
extern "C" __device__ int scene_data_lookup_int(
Texture_handler_base const *self_base,
unsigned scene_data_id,
int default_value,
bool uniform_lookup)
{
// just return default value
return default_value;
}
// Implementation of scene_data_lookup_float4x4().
extern "C" __device__ void scene_data_lookup_float4x4(
float result[16],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[16],
bool uniform_lookup)
{
// just return default value
for (int i = 0; i < 16; ++i)
result[i] = default_value[i];
}
// Implementation of scene_data_lookup_float4() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float4(
Texture_handler_base const *self_base,
unsigned scene_data_id,
tct_deriv_arr_float_4 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float3() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float3(
Texture_handler_base const *self_base,
unsigned scene_data_id,
tct_deriv_arr_float_3 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_color() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_color(
Texture_handler_base const *self_base,
unsigned scene_data_id,
tct_deriv_arr_float_3 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float2() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float2(
Texture_handler_base const *self_base,
unsigned scene_data_id,
tct_deriv_arr_float_2 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float(
tct_deriv_float *result,
Texture_handler_base const *self_base,
unsigned scene_data_id,
tct_deriv_float const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
#endif // TEX_SUPPORT_NO_DUMMY_SCENEDATA
// ------------------------------------------------------------------------------------------------
// Vtables
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_VTABLES
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode.
__device__ mi::neuraylib::Texture_handler_vtable tex_vtable = {
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
tex_frame,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
adapt_normal,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_float4x4,
};
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode with derivatives.
__device__ mi::neuraylib::Texture_handler_deriv_vtable tex_deriv_vtable = {
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
tex_frame,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
adapt_normal,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_float4x4,
scene_data_lookup_deriv_float,
scene_data_lookup_deriv_float2,
scene_data_lookup_deriv_float3,
scene_data_lookup_deriv_float4,
scene_data_lookup_deriv_color,
};
#endif // TEX_SUPPORT_NO_VTABLES
#endif // __CUDACC__
#endif // TEXTURE_SUPPORT_CUDA_H
tct_traits<true>::tct_derivable_float2 tct_deriv_float2
A float2 with derivatives.
Definition: target_code_types.h:129
Mbsdf_part
MBSDFs can consist of two parts, which can be selected using this enumeration.
Definition: target_code_types.h:328
tct_deriv< float[4]> tct_deriv_arr_float_4
A float[4] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:144
tct_traits<true>::tct_derivable_float tct_deriv_float
A float with derivatives.
Definition: target_code_types.h:126
Tex_wrap_mode
The texture wrap modes as defined by tex::wrap_mode in the MDL specification.
Definition: target_code_types.h:309
struct Shading_state_material_impl<true> Shading_state_material_with_derivs
The MDL material state structure with derivatives for the texture coordinates.
Definition: target_code_types.h:303
tct_deriv< float[2]> tct_deriv_arr_float_2
A float[2] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:138
tct_deriv< float[3]> tct_deriv_arr_float_3
A float[3] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:141
@ MBSDF_DATA_TRANSMISSION
the bidirectional transmission distribution function (BTDF)
Definition: target_code_types.h:333
@ MBSDF_DATA_REFLECTION
the bidirectional reflection distribution function (BRDF)
Definition: target_code_types.h:330
The MDL material state structure inside the MDL SDK is a representation of the renderer state as defi...
Definition: target_code_types.h:210
The texture handler structure that is passed to the texturing functions.
Definition: target_code_types.h:712
The texture handler structure that is passed to the texturing functions with derivative support.
Definition: target_code_types.h:721
The runtime for bitmap texture access for the generated target code can optionally be implemented in ...
Definition: target_code_types.h:344
A template struct with derivatives.
Definition: target_code_types.h:97

Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h

/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// Code shared by CUDA MDL SDK examples
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include <math.h>
#include "example_shared.h"
#include "compiled_material_traverser_base.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
#include "utils/profiling.h"
// Structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object; // uses filter mode cudaFilterModeLinear
cudaTextureObject_t unfiltered_object; // uses filter mode cudaFilterModePoint
uint3 size; // size of the texture, needed for texel access
float3 inv_size; // the inverse values of the size of the texture
};
// Structure representing an MDL bsdf measurement.
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2]; // true if there is a measurement for this part
cudaTextureObject_t eval_data[2]; // uses filter mode cudaFilterModeLinear
float max_albedo[2]; // max albedo used to limit the multiplier
float* sample_data[2]; // CDFs for sampling a BSDF measurement
float* albedo_data[2]; // max albedo for each theta (isotropic)
uint2 angular_resolution[2]; // size of the dataset, needed for texel access
float2 inv_angular_resolution[2]; // the inverse values of the size of the dataset
unsigned num_channels[2]; // number of color channels (1 or 3)
};
// Structure representing a Light Profile
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, inv_angular_resolution(make_float2(
1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y)))
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution; // angular resolution of the grid
float2 inv_angular_resolution; // inverse angular resolution of the grid
float2 theta_phi_start; // start of the grid
float2 theta_phi_delta; // angular step size
float2 theta_phi_inv_delta; // inverse step size
float candela_multiplier; // factor to rescale the normalized data
float total_power;
cudaTextureObject_t eval_data; // normalized data sampled on grid
float* cdf_data; // CDFs for sampling a light profile
};
// Structure representing the resources used by the generated code of a target code.
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures; // number of elements in the textures field
CUdeviceptr textures; // a device pointer to a list of Texture objects, if used
size_t num_mbsdfs; // number of elements in the mbsdfs field
CUdeviceptr mbsdfs; // a device pointer to a list of mbsdfs objects, if used
size_t num_lightprofiles; // number of elements in the lightprofiles field
CUdeviceptr lightprofiles; // a device pointer to a list of mbsdfs objects, if used
CUdeviceptr ro_data_segment; // a device pointer to the read-only data segment, if used
};
//------------------------------------------------------------------------------
//
// Helper functions
//
//------------------------------------------------------------------------------
// Return a textual representation of the given value.
template <typename T>
std::string to_string(T val)
{
std::ostringstream stream;
stream << val;
return stream.str();
}
// Collects the handles in a compiled material
class Handle_collector : public Compiled_material_traverser_base
{
public:
// add all handle appearing in the provided material to the collectors handle list.
explicit Handle_collector(
: Compiled_material_traverser_base()
{
traverse(material, transaction);
}
// get the collected handles.
const std::vector<std::string>& get_handles() const { return m_handles; }
private:
// Called when the traversal reaches a new element.
void visit_begin(const mi::neuraylib::ICompiled_material* material,
const Compiled_material_traverser_base::Traversal_element& element,
void* context) override
{
// look for direct calls
if (!element.expression ||
element.expression->get_kind() != mi::neuraylib::IExpression::EK_DIRECT_CALL)
return;
// check if it is a distribution function
auto transaction = static_cast<mi::neuraylib::ITransaction*>(context);
element.expression->get_interface<const mi::neuraylib::IExpression_direct_call
>());
expr_dcall->get_arguments());
expr_dcall->get_definition()));
get_semantic();
if (semantic < mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_FIRST
|| semantic > mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_LAST)
return;
// check if the last argument is a handle
expr_dcall->get_arguments());
mi::Size arg_count = arguments->get_size();
const char* name = arguments->get_name(arg_count - 1);
if (strcmp(name, "handle") != 0)
return;
// get the handle value
arguments->get_expression(arg_count - 1));
if (expr->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
return; // is an error if 'handle' is a reserved parameter name
expr->get_interface<const mi::neuraylib::IExpression_constant>());
const mi::base::Handle<const mi::neuraylib::IValue> value(expr_const->get_value());
if (value->get_kind() != mi::neuraylib::IValue::VK_STRING)
return;
value->get_interface<const mi::neuraylib::IValue_string>());
std::string handle_value = handle->get_value() ? std::string(handle->get_value()) : "";
if (std::find(m_handles.begin(), m_handles.end(), handle_value) == m_handles.end())
m_handles.push_back(handle_value);
}
std::vector<std::string> m_handles;
};
//------------------------------------------------------------------------------
//
// CUDA helper functions
//
//------------------------------------------------------------------------------
// Helper macro. Checks whether the expression is cudaSuccess and if not prints a message and
// resets the device and exits.
#ifdef ENABLE_DEPRECATED_UTILIY_FUNCTIONS
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
#else
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
cudaDeviceReset(); \
exit_failure( "Error in file %s, line %u: \"%s\".\n", __FILE__, __LINE__, #expr); \
} \
} while (false)
#endif
// Initialize CUDA.
CUcontext init_cuda(
int ordinal
#ifdef OPENGL_INTEROP
, const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
// Use first device used by OpenGL context
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
{
// Use given device
check_cuda_success(cuDeviceGet(&cu_device, ordinal));
}
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
// For this example, increase printf CUDA buffer size to support a larger number
// of MDL debug::print() calls per CUDA kernel launch
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
// Uninitialize CUDA.
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
/*compile error*/
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
// No copy possible.
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
T &r = *it;
deleter(r);
}
}
C &operator*() { return m_cont; }
C const &operator*() const { return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
// No copy possible.
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
// Allocate memory on GPU and copy the given data to the allocated memory.
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T>
CUdeviceptr gpu_mem_dup(std::vector<T> const &data)
{
return gpu_mem_dup(&data[0], data.size() * sizeof(T));
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
//------------------------------------------------------------------------------
//
// Material_gpu_context class
//
//------------------------------------------------------------------------------
// Helper class responsible for making textures and read-only data available to the GPU
// by generating and managing a list of Target_code_data objects.
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
// Use first entry as "not-used" block
m_target_argument_block_list->push_back(0);
}
// Prepare the needed data of the given target code.
bool prepare_target_code_data(
mi::neuraylib::ITarget_code const *target_code,
std::vector<size_t> const &arg_block_indices);
// Get a device pointer to the target code data list.
CUdeviceptr get_device_target_code_data_list();
// Get a device pointer to the target argument block list.
CUdeviceptr get_device_target_argument_block_list();
// Get a device pointer to the i'th target argument block.
CUdeviceptr get_device_target_argument_block(size_t i)
{
// First entry is the "not-used" block, so start at index 1.
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
// Get the number of target argument blocks.
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
// Get the argument block of the i'th BSDF.
// If the BSDF has no target argument block, size_t(~0) is returned.
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
// Get a writable copy of the i'th target argument block.
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
// Get the layout of the i'th target argument block.
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void update_device_argument_block(size_t i);
private:
// Copy the image data of a canvas to a CUDA array.
void copy_canvas_to_cuda_array(cudaArray_t device_array, mi::neuraylib::ICanvas const *canvas);
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool prepare_texture(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size texture_index,
std::vector<Texture> &textures);
// Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
// functions on the GPU.
bool prepare_mbsdf(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size mbsdf_index,
std::vector<Mbsdf> &mbsdfs);
// Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
// functions on the GPU.
bool prepare_lightprofile(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size lightprofile_index,
std::vector<Lightprofile> &lightprofiles);
// If true, mipmaps will be generated for all 2D textures.
bool m_enable_derivatives;
// The device pointer of the target code data list.
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
// List of all target code data objects owned by this context.
Resource_container<Target_code_data> m_target_code_data_list;
// The device pointer of the target argument block list.
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
// List of all target argument blocks owned by this context.
Resource_container<CUdeviceptr> m_target_argument_block_list;
// List of all local, writable copies of the target argument blocks.
std::vector<mi::base::Handle<mi::neuraylib::ITarget_argument_block> > m_own_arg_blocks;
// List of argument block indices per material BSDF.
std::vector<size_t> m_bsdf_arg_block_indices;
// List of all target argument block layouts.
std::vector<mi::base::Handle<mi::neuraylib::ITarget_value_layout const> > m_arg_block_layouts;
// List of all Texture objects owned by this context.
Resource_container<Texture> m_all_textures;
// List of all MBSDFs objects owned by this context.
Resource_container<Mbsdf> m_all_mbsdfs;
// List of all Light profiles objects owned by this context.
Resource_container<Lightprofile> m_all_lightprofiles;
// List of all CUDA arrays owned by this context.
Resource_container<cudaArray_t> m_all_texture_arrays;
// List of all CUDA mipmapped arrays owned by this context.
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
// Get a device pointer to the target code data list.
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
// Get a device pointer to the target argument block list.
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
// Copy the image data of a canvas to a CUDA array.
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
mi::neuraylib::ICanvas const *canvas)
{
mi::Float32 const *data = static_cast<mi::Float32 const *>(tile->get_data());
check_cuda_success(cudaMemcpy2DToArray(
device_array, 0, 0, data,
canvas->get_resolution_x() * sizeof(float) * 4,
canvas->get_resolution_x() * sizeof(float) * 4,
canvas->get_resolution_y(),
cudaMemcpyHostToDevice));
}
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool Material_gpu_context::prepare_texture(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size texture_index,
std::vector<Texture> &textures)
{
// Get access to the texture data by the texture database name from the target code.
transaction->access<mi::neuraylib::ITexture>(code_ptx->get_texture(texture_index)));
transaction->access<mi::neuraylib::IImage>(texture->get_image()));
mi::base::Handle<const mi::neuraylib::ICanvas> canvas(image->get_canvas(0, 0, 0));
mi::Uint32 tex_width = canvas->get_resolution_x();
mi::Uint32 tex_height = canvas->get_resolution_y();
mi::Uint32 tex_layers = canvas->get_layers_size();
char const *image_type = image->get_type(0, 0);
if (image->is_uvtile() || image->is_animated()) {
std::cerr << "The example does not support uvtile and/or animated textures!" << std::endl;
return false;
}
// For simplicity, the texture access functions are only implemented for float4 and gamma
// is pre-applied here (all images are converted to linear space).
// Convert to linear color space if necessary
if (texture->get_effective_gamma(0, 0) != 1.0f) {
// Copy/convert to float4 canvas and adjust gamma from "effective gamma" to 1.
image_api->convert(canvas.get(), "Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma(0, 0));
image_api->adjust_gamma(gamma_canvas.get(), 1.0f);
canvas = gamma_canvas;
} else if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0) {
// Convert to expected format
canvas = image_api->convert(canvas.get(), "Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
// Copy image data to GPU array depending on texture shape
code_ptx->get_texture_shape(texture_index);
// Cubemap and 3D texture objects require 3D CUDA arrays
tex_layers != 6) {
std::cerr << "Invalid number of layers (" << tex_layers
<< "), cubemaps must have 6 layers!" << std::endl;
return false;
}
// Allocate a 3D array on the GPU
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
cudaArrayCubemap : 0));
// Prepare the memcpy parameter structure
cudaMemcpy3DParms copy_params;
memset(&copy_params, 0, sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
// Copy the image data of all layers (the layers are not consecutive in memory)
for (mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
canvas->get_tile(layer));
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(&copy_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
// mipmapped textures use CUDA mipmapped arrays
mi::Uint32 num_levels = image->get_levels(0, 0);
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
// create all mipmap levels and copy them to the CUDA arrays in the mipmapped array
mi::base::Handle<mi::IArray> mipmaps(image_api->create_mipmap(canvas.get(), 1.0f));
for (mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
mi::base::Handle<mi::IPointer> mipmap_ptr(mipmaps->get_element<mi::IPointer>(level - 1));
level_canvas = mipmap_ptr->get_pointer<mi::neuraylib::ICanvas>();
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
// 2D texture objects use CUDA arrays
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
// For cube maps we need clamped address mode to avoid artifacts in the corners
cudaTextureAddressMode addr_mode =
? cudaAddressModeClamp
: cudaAddressModeWrap;
// Create filtered texture object
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f; // default value in OpenGL
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
// Create unfiltered texture object if necessary (cube textures have no texel functions)
cudaTextureObject_t tex_obj_unfilt = 0;
// Use a black border for access outside of the texture
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
// Store texture infos in result vector
textures.push_back(Texture(
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.back());
return true;
}
namespace
{
bool prepare_mbsdfs_part(mi::neuraylib::Mbsdf_part part, Mbsdf& mbsdf_cuda_representation,
const mi::neuraylib::IBsdf_measurement* bsdf_measurement)
{
switch (part)
{
dataset = bsdf_measurement->get_reflection<mi::neuraylib::Bsdf_isotropic_data>();
break;
dataset = bsdf_measurement->get_transmission<mi::neuraylib::Bsdf_isotropic_data>();
break;
}
// no data, fine
if (!dataset)
return true;
// get dimensions
uint2 res;
res.x = dataset->get_resolution_theta();
res.y = dataset->get_resolution_phi();
unsigned num_channels = dataset->get_type() == mi::neuraylib::BSDF_SCALAR ? 1 : 3;
mbsdf_cuda_representation.Add(part, res, num_channels);
// get data
mi::base::Handle<const mi::neuraylib::IBsdf_buffer> buffer(dataset->get_bsdf_buffer());
// {1,3} * (index_theta_in * (res_phi * res_theta) + index_theta_out * res_phi + index_phi)
const mi::Float32* src_data = buffer->get_data();
// ----------------------------------------------------------------------------------------
// prepare importance sampling data:
// - for theta_in we will be able to perform a two stage CDF, first to select theta_out,
// and second to select phi_out
// - maximum component is used to "probability" in case of colored measurements
// CDF of the probability to select a certain theta_out for a given theta_in
const unsigned int cdf_theta_size = res.x * res.x;
// for each of theta_in x theta_out combination, a CDF of the probabilities to select a
// a certain theta_out is stored
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x]; // albedo for sampling reflection and transmission
float* sample_data_theta = sample_data; // begin of the first (theta) CDF
float* sample_data_phi = sample_data + cdf_theta_size; // begin of the second (phi) CDFs
const float s_theta = (float) (M_PI * 0.5) / float(res.x); // step size
const float s_phi = (float) (M_PI) / float(res.y); // step size
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
// BSDFs are symmetric: f(w_in, w_out) = f(w_out, w_in)
// take the average of both measurements
// area of two the surface elements (the ones we are averaging)
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
// offset for both the thetas into the measurement data (select row in the volume)
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
// build CDF for phi
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value = fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+ fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else /* num_channels == 1 */
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
// normalize CDF for phi
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
// build CDF for theta
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
// normalize CDF for theta
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
// copy entire CDF data buffer to GPU
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
// ----------------------------------------------------------------------------------------
// prepare evaluation data:
// - simply store the measured data in a volume texture
// - in case of color data, we store each sample in a vector4 to get texture support
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
// make lookup data symmetric
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
// Copy data to GPU array
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>() // float3 is not supported
: cudaCreateChannelDesc<float>());
// Allocate a 3D array on the GPU (phi_delta x theta_out x theta_in)
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
// prepare and copy
cudaMemcpy3DParms copy_params;
memset(&copy_params, 0, sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data), // base pointer
res.y * lookup_channels * sizeof(float), // row pitch
res.y, // width of slice
res.x); // height of slice
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(&copy_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size mbsdf_index,
std::vector<Mbsdf> &mbsdfs)
{
// Get access to the texture data by the texture database name from the target code.
code_ptx->get_bsdf_measurement(mbsdf_index)));
Mbsdf mbsdf_cuda;
// handle reflection and transmission
if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_REFLECTION, mbsdf_cuda, mbsdf.get()))
return false;
if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_TRANSMISSION, mbsdf_cuda, mbsdf.get()))
return false;
mbsdfs.push_back(mbsdf_cuda);
m_all_mbsdfs->push_back(mbsdfs.back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size lightprofile_index,
std::vector<Lightprofile> &lightprofiles)
{
// Get access to the texture data by the texture database name from the target code.
code_ptx->get_light_profile(lightprofile_index)));
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
// phi-mayor: [res.x x res.y]
const float* data = lprof_nr->get_data();
// --------------------------------------------------------------------------------------------
// compute total power
// compute inverse CDF data for sampling
// sampling will work on cells rather than grid nodes (used for evaluation)
// first (res.x-1) for the cdf for sampling theta
// rest (rex.x-1) * (res.y-1) for the individual cdfs for sampling phi (after theta)
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
// area of the patch (grid cell)
// \mu = int_{theta0}^{theta1} sin{theta} \delta theta
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
// build CDF for phi
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
// the probability to select a patch corresponds to the value times area
// the value of a cell is the average of the corners
// omit the *1/4 as we normalize in the end
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
// normalize CDF for phi
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
// build CDF for theta
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
// normalize CDF for theta
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
// copy entire CDF data buffer to GPU
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
// --------------------------------------------------------------------------------------------
// prepare evaluation data
// - use a 2d texture that allows bilinear interpolation
// Copy data to GPU array
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
// 2D texture objects use CUDA arrays
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpy2DToArray(
device_lightprofile_data, 0, 0, data,
res.x * sizeof(float), res.x * sizeof(float), res.y, cudaMemcpyHostToDevice));
// Create filtered texture object
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
lightprofiles.push_back(lprof);
m_all_lightprofiles->push_back(lightprofiles.back());
return true;
}
// Prepare the needed target code data of the given target code.
bool Material_gpu_context::prepare_target_code_data(
mi::neuraylib::ITarget_code const *target_code,
std::vector<size_t> const &arg_block_indices)
{
// Target code data list may not have been retrieved already
check_success(m_device_target_code_data_list.get() == 0);
// Handle the read-only data segments if necessary.
// They are only created, if the "enable_ro_segment" backend option was set to "on".
CUdeviceptr device_ro_data = 0;
if (target_code->get_ro_data_segment_count() > 0) {
device_ro_data = gpu_mem_dup(
target_code->get_ro_data_segment_data(0),
target_code->get_ro_data_segment_size(0));
}
// Copy textures to GPU if the code has more than just the invalid texture
CUdeviceptr device_textures = 0;
mi::Size num_textures = target_code->get_texture_count();
if (num_textures > 1) {
std::vector<Texture> textures;
// Loop over all textures skipping the first texture,
// which is always the invalid texture
for (mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
// Copy texture list to GPU
device_textures = gpu_mem_dup(textures);
}
// Copy MBSDFs to GPU if the code has more than just the invalid mbsdf
CUdeviceptr device_mbsdfs = 0;
mi::Size num_mbsdfs = target_code->get_bsdf_measurement_count();
if (num_mbsdfs > 1) {
std::vector<Mbsdf> mbsdfs;
// Loop over all mbsdfs skipping the first mbsdf,
// which is always the invalid mbsdf
for (mi::Size i = 1; i < num_mbsdfs; ++i) {
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
// Copy mbsdf list to GPU
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
// Copy light profiles to GPU if the code has more than just the invalid light profile
CUdeviceptr device_lightprofiles = 0;
mi::Size num_lightprofiles = target_code->get_light_profile_count();
if (num_lightprofiles > 1) {
std::vector<Lightprofile> lightprofiles;
// Loop over all profiles skipping the first profile,
// which is always the invalid profile
for (mi::Size i = 1; i < num_lightprofiles; ++i) {
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
// Copy light profile list to GPU
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
for (mi::Size i = 0, num = target_code->get_argument_block_count(); i < num; ++i) {
target_code->get_argument_block(i));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_own_arg_blocks.push_back(mi::base::make_handle(arg_block->clone()));
m_arg_block_layouts.push_back(
}
for (size_t arg_block_index : arg_block_indices) {
m_bsdf_arg_block_indices.push_back(arg_block_index);
}
return true;
}
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
//------------------------------------------------------------------------------
//
// MDL material compilation code
//
//------------------------------------------------------------------------------
class Material_compiler {
public:
// Constructor.
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
bool enable_pdf,
bool use_adapt_normal,
bool enable_bsdf_flags,
const std::string& df_handle_mode,
const std::string& lambda_return_mode);
// Loads an MDL module and returns the module DB.
std::string load_module(const std::string& mdl_module_name);
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
// If class_compilation is true, the material will use class compilation.
bool add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* fname,
bool class_compilation=false);
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
// If class_compilation is true, the material will use class compilation.
bool add_material_df(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* base_fname,
bool class_compilation=false);
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_descriptions. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool add_material(
const std::string& qualified_module_name,
const std::string& material_db_name,
mi::Size description_count,
bool class_compilation);
// Generates CUDA PTX target code for the current link unit.
typedef std::vector<mi::base::Handle<mi::neuraylib::IFunction_definition const> >
Material_definition_list;
// Get the list of used material definitions.
// There will be one entry per add_* call.
Material_definition_list const &get_material_defs()
{
return m_material_defs;
}
typedef std::vector<mi::base::Handle<mi::neuraylib::ICompiled_material const> >
Compiled_material_list;
// Get the list of compiled materials.
// There will be one entry per add_* call.
Compiled_material_list const &get_compiled_materials()
{
return m_compiled_materials;
}
std::vector<size_t> const &get_argument_block_indices() const {
return m_arg_block_indexes;
}
const std::vector<std::string>& get_handles() const {
return m_handles;
}
private:
// Creates an instance of the given material.
mi::neuraylib::IFunction_call* create_material_instance(
const std::string& qualified_module_name,
const std::string& material_db_name);
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material* compile_material_instance(
mi::neuraylib::IFunction_call* material_instance,
bool class_compilation);
private:
Material_definition_list m_material_defs;
Compiled_material_list m_compiled_materials;
std::vector<size_t> m_arg_block_indexes;
std::vector<std::string> m_handles;
};
// Constructor.
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
bool enable_pdf,
bool use_adapt_normal,
bool enable_bsdf_flags,
const std::string& df_handle_mode,
const std::string& lambda_return_mode)
: m_mdl_impexp_api(mdl_impexp_api, mi::base::DUP_INTERFACE)
, m_be_cuda_ptx(mdl_backend_api->get_backend(mi::neuraylib::IMdl_backend_api::MB_CUDA_PTX))
, m_mdl_factory(mdl_factory, mi::base::DUP_INTERFACE)
, m_transaction(transaction, mi::base::DUP_INTERFACE)
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
// Option "enable_ro_segment": Default is disabled.
// If you have a lot of big arrays, enabling this might speed up compilation.
// check_success(m_be_cuda_ptx->set_option("enable_ro_segment", "on") == 0);
if (enable_derivatives) {
// Option "texture_runtime_with_derivs": Default is disabled.
// We enable it to get coordinates with derivatives for texture lookup functions.
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
// Option "tex_lookup_call_mode": Default mode is vtable mode.
// You can switch to the slower vtable mode by commenting out the next line.
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
// Option "num_texture_results": Default is 0.
// Set the size of a renderer provided array for texture results in the MDL SDK state in number
// of float4 elements processed by the init() function.
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
if (enable_auxiliary) {
// Option "enable_auxiliary": Default is disabled.
// We enable it to create an additional 'auxiliary' function that can be called on each
// distribution function to fill an albedo and normal buffer e.g. for denoising.
check_success(m_be_cuda_ptx->set_option("enable_auxiliary", "on") == 0);
}
if (!enable_pdf) {
// Option "enable_pdf": Default is enabled.
check_success(m_be_cuda_ptx->set_option("enable_pdf", "off") == 0);
}
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
// Option "enable_df_interpreter": Default is disabled.
// Using the interpreter allows to reuse the same code for multiple materials
// reducing code divergence, if your scene shows many materials at the same time.
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
// Option "df_handle_slot_mode": Default is "none".
// When using light path expressions, individual parts of the distribution functions can be
// selected using "handles". The contribution of each of those parts has to be evaluated during
// rendering. This option controls how many parts are evaluated with each call into the
// generated "evaluate" and "auxiliary" functions and how the data is passed.
// The CUDA backend supports pointers, which means an externally managed buffer of arbitrary
// size is used to transport the contributions of each part.
check_success(m_be_cuda_ptx->set_option("df_handle_slot_mode", df_handle_mode.c_str()) == 0);
// Option "lambda_return_mode": Default is "default".
// Selects how generated lambda functions return their results. For PTX, the default
// is equivalent to "sret" mode, where all values are returned in a buffer provided as
// first argument. In "value" mode, base types and vector types are directly returned
// by the functions, other types are returned as for the "sret" mode.
check_success(m_be_cuda_ptx->set_option("lambda_return_mode", lambda_return_mode.c_str()) == 0);
// Option "scene_data_names": Default is "".
// Uncomment the line below to enable calling the scene data runtime functions
// for any scene data names or specify a comma-separated list of names for which
// you may provide scene data. The example runtime functions always return the
// default values, which is the same as not supporting any scene data.
// m_be_cuda_ptx->set_option("scene_data_names", "*");
if (use_adapt_normal) {
// Option "use_renderer_adapt_normal": Default is "off".
// If enabled, the renderer can adapt the normal of BSDFs before use.
check_success(m_be_cuda_ptx->set_option("use_renderer_adapt_normal", "on") == 0);
}
if (enable_bsdf_flags) {
// Option "libbsdf_flags_in_bsdf_data": Default is "off".
// If enabled, the generated code will use the optional "flags" field
// in the BSDF data structures.
check_success(m_be_cuda_ptx->set_option("libbsdf_flags_in_bsdf_data", "on") == 0);
}
// force experimental to true for now
m_context->set_option("experimental", true);
m_context->set_option("fold_ternary_on_df", fold_ternary_on_df);
// After we set the options, we can create the link unit
m_link_unit = mi::base::make_handle(m_be_cuda_ptx->create_link_unit(transaction, m_context.get()));
}
std::string Material_compiler::load_module(const std::string& mdl_module_name)
{
// load module
m_mdl_impexp_api->load_module(m_transaction.get(), mdl_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get()))
exit_failure("Failed to load module: %s", mdl_module_name.c_str());
// get and return the DB name
m_mdl_factory->get_db_module_name(mdl_module_name.c_str()));
return db_module_name->get_c_str();
}
// Creates an instance of the given material.
mi::neuraylib::IFunction_call* Material_compiler::create_material_instance(
const std::string& qualified_module_name,
const std::string& material_db_name)
{
// Create a material instance from the material definition
// with the default arguments.
m_transaction->access<mi::neuraylib::IFunction_definition>(
material_db_name.c_str()));
if (!material_definition) {
// material with given name does not exist
print_message(
(
"Material '" +
material_db_name +
"' does not exist in '" +
qualified_module_name + "'").c_str());
return nullptr;
}
m_material_defs.push_back(material_definition);
mi::Sint32 result;
material_definition->create_function_call(0, &result));
check_success(result == 0);
material_instance->retain();
return material_instance.get();
}
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material *Material_compiler::compile_material_instance(
mi::neuraylib::IFunction_call* material_instance,
bool class_compilation)
{
// convert to target type SID_MATERIAL
m_mdl_factory->create_type_factory(m_transaction.get()));
tf->get_predefined_struct(mi::neuraylib::IType_struct::SID_MATERIAL));
m_context->set_option("target_type", standard_material_type.get());
mi::Uint32 flags = class_compilation
material_instance->get_interface<const mi::neuraylib::IMaterial_instance>());
material_instance2->create_compiled_material(flags, m_context.get()));
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
compiled_material->retain();
return compiled_material.get();
}
// Generates CUDA PTX target code for the current link unit.
mi::base::Handle<const mi::neuraylib::ITarget_code> Material_compiler::generate_cuda_ptx()
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
FILE *file = fopen("target_code.ptx", "wt");
if (file)
{
fwrite(code_cuda_ptx->get_code(), code_cuda_ptx->get_code_size(), 1, file);
fclose(file);
}
#endif
return code_cuda_ptx;
}
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* fname,
bool class_compilation)
{
desc.path = path;
desc.base_fname = fname;
add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
return desc.return_code == 0;
}
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_df(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* base_fname,
bool class_compilation)
{
desc.path = path;
desc.base_fname = base_fname;
add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
return desc.return_code == 0;
}
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_description. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool Material_compiler::add_material(
const std::string& qualified_module_name,
const std::string& material_db_name,
mi::Size description_count,
bool class_compilation)
{
if (description_count == 0)
return false;
// Load the given module and create a material instance
create_material_instance(qualified_module_name, material_db_name));
if (!material_instance)
return false;
// Compile the material instance in instance compilation mode
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
// Note: the same argument_block_index is filled into all function descriptions of a
// material, if any function uses it
m_arg_block_indexes.push_back(function_descriptions[0].argument_block_index);
return print_messages(m_context.get());
}
//------------------------------------------------------------------------------
//
// Material execution code
//
//------------------------------------------------------------------------------
// Helper function to create PTX source code for a non-empty 32-bit value array.
void print_array_u32(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
// PTX does not allow empty arrays, so use a dummy entry
str += "1] = { 0 };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
// Helper function to create PTX source code for a non-empty function pointer array.
void print_array_func(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
// PTX does not allow empty arrays, so use a dummy entry
str += "1] = { dummy_func };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
// Generate PTX array containing the references to all generated functions.
std::string generate_func_array_ptx(
const std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > &target_codes)
{
// Create PTX header and mdl_expr_functions_count constant
std::string src =
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
// Workaround needed to let CUDA linker resolve the function pointers in the arrays.
// Also used for "empty" function arrays.
src += ".func dummy_func() { ret; }\n";
std::string tc_offsets;
std::string function_names;
std::string tc_indices;
std::string ab_indices;
unsigned f_count = 0;
// Iterate over all target codes
for (size_t tc_index = 0, num = target_codes.size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
// in case of multiple target codes, we need to address the functions by a pair of
// target_code_index and function_index.
// the elements in the resulting function array can then be index by offset + func_index.
if(!tc_offsets.empty())
tc_offsets += ", ";
tc_offsets += to_string(f_count);
// Collect all names and prototypes of callable functions within the current target code
for (size_t func_index = 0, func_count = target_code->get_callable_function_count();
func_index < func_count; ++func_index)
{
// add to function list
if (!tc_indices.empty())
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
// target code index in case of multiple link units
tc_indices += to_string(tc_index);
// name of the function
function_names += target_code->get_callable_function(func_index);
// Get argument block index and translate to 1 based list index (-> 0 = not-used)
mi::Size ab_index = target_code->get_callable_function_argument_block_index(func_index);
ab_indices += to_string(ab_index == mi::Size(~0) ? 0 : (ab_index + 1));
f_count++;
// Add prototype declaration
src += target_code->get_callable_function_prototype(
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
// infos per target code (link unit)
src += std::string(".visible .const .align 4 .u32 mdl_target_code_count = ")
+ to_string(target_codes.size()) + ";\n";
print_array_u32(
src, std::string("mdl_target_code_offsets"), unsigned(target_codes.size()), tc_offsets);
// infos per function
src += std::string(".visible .const .align 4 .u32 mdl_functions_count = ")
+ to_string(f_count) + ";\n";
print_array_func(src, std::string("mdl_functions"), f_count, function_names);
print_array_u32(src, std::string("mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src, std::string("mdl_target_code_indices"), f_count, tc_indices);
return src;
}
// Build a linked CUDA kernel containing our kernel and all the generated code, making it
// available to the kernel via an added "mdl_expr_functions" array.
CUmodule build_linked_kernel(
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > const &target_codes,
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
// Generate PTX array containing the references to all generated functions.
// The linker will resolve them to addresses.
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
FILE *file = fopen("func_array.ptx", "wt");
if (file)
{
fwrite(ptx_func_array_src.c_str(), ptx_func_array_src.size(), 1, file);
fclose(file);
}
#endif
// Link all generated code, our generated PTX array and our kernel together
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
mi::examples::profiling::Timing timing("PTX to SASS");
// Setup the linker
// Pass a buffer for info messages
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
// Pass the size of the info buffer
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
// Pass a buffer for error messages
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
// Pass the size of the error buffer
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
// Add all code generated by the MDL PTX backend
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
// Add the "mdl_expr_functions" array PTX module
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.c_str()),
ptx_func_array_src.size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
// Add our kernel
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
// Link everything to a cubin
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
std::cerr << "PTX linker error:\n" << error_log << std::endl;
check_cuda_success(link_result);
}
timing.stop();
std::cout << "CUDA link completed." << std::endl;
if (info_log[0])
std::cout << "Linker output:\n" << info_log << std::endl;
#ifdef DUMP_PTX
file = fopen("target_code.cubin", "wb");
if (file)
{
fwrite(linked_cubin, linked_cubin_size, 1, file);
fclose(file);
}
#endif
// Load the result and get the entrypoint of our kernel
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(&regs, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
std::cout << "Kernel uses " << regs << " registers and " << lmem << " lmem and has a size of "
<< linked_cubin_size << " bytes." << std::endl;
// Cleanup
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif // EXAMPLE_CUDA_SHARED_H
This interface represents mutable pointers.
Definition: ipointer.h:43
Example implementation of the abstract interface mi::neuraylib::IBsdf_isotropic_data.
Definition: bsdf_isotropic_data.h:60
virtual const base::IInterface * get_reflection() const =0
Returns the BSDF data for the reflection.
virtual const base::IInterface * get_transmission() const =0
Returns the BSDF data for transmission.
virtual Uint32 get_layers_size() const =0
Returns the number of layers this canvas has.
virtual Uint32 get_resolution_y() const =0
Returns the resolution of the canvas in y direction.
virtual Uint32 get_resolution_x() const =0
Returns the resolution of the canvas in x direction.
Abstract interface for a canvas represented by a rectangular array of tiles.
Definition: icanvas.h:89
virtual const ITile * get_tile(Uint32 layer=0) const =0
Returns the tile for the given layer.
A direct call expression.
Definition: iexpression.h:241
@ EK_DIRECT_CALL
A direct call expression. See mi::neuraylib::IExpression_direct_call.
Definition: iexpression.h:61
This interface represents a function call.
Definition: ifunction_call.h:52
Semantics
All known semantics of functions definitions.
Definition: ifunction_definition.h:54
virtual ITile * convert(const ITile *tile, const char *pixel_type) const =0
Converts a tile to a different pixel type.
virtual void adjust_gamma(ITile *tile, Float32 old_gamma, Float32 new_gamma) const =0
Sets the gamma value of a tile and adjusts the pixel data accordingly.
virtual IArray * create_mipmap(const ICanvas *canvas, Float32 gamma_override=0.0f) const =0
Creates a mipmap from the given canvas.
This interface represents a material instance.
Definition: imaterial_instance.h:34
@ CLASS_COMPILATION
Selects class compilation instead of instance compilation.
Definition: imaterial_instance.h:41
@ DEFAULT_OPTIONS
Default compilation options (e.g., instance compilation).
Definition: imaterial_instance.h:40
@ MSG_COMPILER_DAG
MDL Core DAG generator message.
Definition: imdl_execution_context.h:40
A transaction provides a consistent view on the database.
Definition: itransaction.h:82
@ SID_MATERIAL
The "::material" struct type.
Definition: itype.h:484
A value of type string.
Definition: ivalue.h:221
virtual const IInterface * get_interface(const Uuid &interface_id) const =0
Acquires a const interface from another.
static const Dup_interface DUP_INTERFACE
Symbolic constant to trigger a special constructor in the Handle class.
Definition: handle.h:37
Handle<Interface> make_handle(Interface *iptr)
Returns a handle that holds the interface pointer passed in as argument.
Definition: handle.h:428
@ MESSAGE_SEVERITY_ERROR
An error has occurred.
Definition: enums.h:35
float Float32
32-bit float.
Definition: types.h:51
Bbox<T, DIM> operator*(const Bbox<T, DIM> &bbox, T factor)
Returns a bounding box that is a version of bbox scaled by factor, i.e., bbox.max and bbox....
Definition: bbox.h:502
virtual const char * get_texture(Size index) const =0
Returns the name of a texture resource used by the target code.
const char * path
The path from the material root to the expression that should be translated, e.g.,...
Definition: imdl_backend.h:1779
virtual Size get_bsdf_measurement_count() const =0
Returns the number of bsdf measurement resources used by the target code.
virtual Texture_shape get_texture_shape(Size index) const =0
Returns the texture shape of a given texture resource used by the target code.
virtual Size get_ro_data_segment_count() const =0
Returns the number of constant data initializers.
virtual Size get_ro_data_segment_size(Size index) const =0
Returns the size of the constant data segment at the given index.
virtual const char * get_ro_data_segment_data(Size index) const =0
Returns the data of the constant data segment at the given index.
virtual Size get_texture_count() const =0
Returns the number of texture resources used by the target code.
virtual Size get_argument_block_count() const =0
Returns the number of target argument blocks.
Texture_shape
Definition: imdl_backend.h:811
virtual const ITarget_argument_block * get_argument_block(Size index) const =0
Get a target argument block if available.
const char * base_fname
The base name of the generated functions.
Definition: imdl_backend.h:1785
virtual const ITarget_value_layout * get_argument_block_layout(Size index) const =0
Get a captured arguments block layout if available.
Sint32 return_code
A return code.
Definition: imdl_backend.h:1838
virtual const char * get_light_profile(Size index) const =0
Returns the name of a light profile resource used by the target code.
virtual const char * get_bsdf_measurement(Size index) const =0
Returns the name of a bsdf measurement resource used by the target code.
virtual Size get_light_profile_count() const =0
Returns the number of light profile resources used by the target code.
@ Texture_shape_bsdf_data
Three-dimensional texture representing a BSDF data table.
Definition: imdl_backend.h:817
@ Texture_shape_cube
Cube map texture.
Definition: imdl_backend.h:815
@ Texture_shape_3d
Three-dimensional texture.
Definition: imdl_backend.h:814
@ BSDF_SCALAR
One scalar per grid value.
Definition: ibsdf_isotropic_data.h:24

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cu

/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#define _USE_MATH_DEFINES
#include <math.h>
#include "example_df_cuda.h"
#include "texture_support_cuda.h"
// To reuse this sample code for the MDL SDK and MDL Core the corresponding namespaces are used.
// when this CUDA code is used in the context of an SDK sample.
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
// when this CUDA code is used in the context of an Core sample.
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
// for LPE support there different options for the renderer, for CUDA a renderer provided buffer
// can be used to retrieve the contributions of the individual handles (named lobes)
// Note, a real renderer would go for one specific option only
#define DF_HSM_POINTER -2
#define DF_HSM_NONE -1
#define DF_HSM_FIXED_1 1
#define DF_HSM_FIXED_2 2
#define DF_HSM_FIXED_4 4
#define DF_HSM_FIXED_8 8
// this is the one that is used,
// Note, this has to match with code backend option "df_handle_slot_mode"
#define DF_HANDLE_SLOTS DF_HSM_POINTER
// If enabled, math::DX(state::texture_coordinates(0).xy) = float2(1, 0) and
// math::DY(state::texture_coordinates(0).xy) = float2(0, 1) will be used.
// #define USE_FAKE_DERIVATIVES
#ifdef ENABLE_DERIVATIVES
typedef Material_expr_function_with_derivs Mat_expr_func;
typedef Bsdf_init_function_with_derivs Bsdf_init_func;
typedef Bsdf_sample_function_with_derivs Bsdf_sample_func;
typedef Bsdf_evaluate_function_with_derivs Bsdf_evaluate_func;
typedef Bsdf_pdf_function_with_derivs Bsdf_pdf_func;
typedef Bsdf_auxiliary_function_with_derivs Bsdf_auxiliary_func;
typedef Edf_init_function_with_derivs Edf_init_func;
typedef Edf_sample_function_with_derivs Edf_sample_func;
typedef Edf_evaluate_function_with_derivs Edf_evaluate_func;
typedef Edf_pdf_function_with_derivs Edf_pdf_func;
typedef Edf_auxiliary_function_with_derivs Edf_auxiliary_func;
typedef Shading_state_material_with_derivs Mdl_state;
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Material_expr_function Mat_expr_func;
typedef Bsdf_init_function Bsdf_init_func;
typedef Bsdf_sample_function Bsdf_sample_func;
typedef Bsdf_evaluate_function Bsdf_evaluate_func;
typedef Bsdf_pdf_function Bsdf_pdf_func;
typedef Bsdf_auxiliary_function Bsdf_auxiliary_func;
typedef Edf_init_function Edf_init_func;
typedef Edf_sample_function Edf_sample_func;
typedef Edf_evaluate_function Edf_evaluate_func;
typedef Edf_pdf_function Edf_pdf_func;
typedef Edf_auxiliary_function Edf_auxiliary_func;
typedef Shading_state_material Mdl_state;
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
// Custom structure representing the resources used by the generated code of a target code object.
struct Target_code_data
{
size_t num_textures; // number of elements in the textures field
Texture *textures; // a list of Texture objects, if used
size_t num_mbsdfs; // number of elements in the mbsdfs field
Mbsdf *mbsdfs; // a list of mbsdfs objects, if used
size_t num_lightprofiles; // number of elements in the lightprofiles field
Lightprofile *lightprofiles; // a list of lightprofiles objects, if used
char const *ro_data_segment; // the read-only data segment, if used
};
// all function types
union Mdl_function_ptr
{
Mat_expr_func *expression;
Bsdf_init_func *bsdf_init;
Bsdf_sample_func *bsdf_sample;
Bsdf_evaluate_func *bsdf_evaluate;
Bsdf_pdf_func *bsdf_pdf;
Bsdf_auxiliary_func *bsdf_auxiliary;
Edf_init_func *edf_init;
Edf_sample_func *edf_sample;
Edf_evaluate_func *edf_evaluate;
Edf_pdf_func *edf_pdf;
Edf_auxiliary_func *edf_auxiliary;
};
// function index offset depending on the target code
extern __constant__ unsigned int mdl_target_code_offsets[];
// number of generated functions
extern __constant__ unsigned int mdl_functions_count;
// the following arrays are indexed by an mdl_function_index
extern __constant__ Mdl_function_ptr mdl_functions[];
extern __constant__ unsigned int mdl_arg_block_indices[];
// Identity matrix.
// The last row is always implied to be (0, 0, 0, 1).
__constant__ const float4 identity[3] = {
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
// the material provides pairs for each generated function to evaluate
// the functions and arg blocks array are indexed by:
// mdl_target_code_offsets[target_code_index] + function_index
typedef uint3 Mdl_function_index;
__device__ inline Mdl_function_index get_mdl_function_index(const uint2& index_pair)
{
return make_uint3(
index_pair.x, // target_code_index
index_pair.y, // function_index inside target code
mdl_target_code_offsets[index_pair.x] + index_pair.y); // global function index
}
// resource handler for accessing textures and other data
// depends on the target code (link unit)
struct Mdl_resource_handler
{
__device__ Mdl_resource_handler()
{
m_tex_handler.vtable = &;TEX_VTABLE; // only required in 'vtable' mode, otherwise NULL
data.shared_data = NULL;
data.texture_handler = reinterpret_cast<Texture_handler_base *>(&m_tex_handler);
}
// reuse the handler with a different target code index
__device__ inline void set_target_code_index(
const Kernel_params& params, const Mdl_function_index& index)
{
m_tex_handler.num_textures = params.tc_data[index.x].num_textures;
m_tex_handler.textures = params.tc_data[index.x].textures;
m_tex_handler.num_mbsdfs = params.tc_data[index.x].num_mbsdfs;
m_tex_handler.mbsdfs = params.tc_data[index.x].mbsdfs;
m_tex_handler.num_lightprofiles = params.tc_data[index.x].num_lightprofiles;
m_tex_handler.lightprofiles = params.tc_data[index.x].lightprofiles;
}
// a pointer to this data is passed to all generated functions
Resource_data data;
private:
Tex_handler m_tex_handler;
};
// checks if the indexed function can be evaluated or not
__device__ inline bool is_valid(const Mdl_function_index& index)
{
return index.y != 0xFFFFFFFFu;
}
// get a pointer to the material parameters which is passed to all generated functions
__device__ inline const char* get_arg_block(
const Kernel_params& params,
const Mdl_function_index& index)
{
return params.arg_block_list[mdl_arg_block_indices[index.z]];
}
// Init function
__device__ inline Bsdf_init_func* as_init(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].bsdf_init;
}
// Expression functions
__device__ inline Mat_expr_func* as_expression(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].expression;
}
// Expression functions
#ifdef ENABLE_DERIVATIVES
template <typename T>
__device__ inline typename Material_function<T>::Type_with_derivs *as_expression_typed(
const Mdl_function_index& index)
{
return reinterpret_cast<typename Material_function<T>::Type_with_derivs *>(
mdl_functions[index.z + 0].expression);
}
#else
template <typename T>
__device__ inline typename Material_function<T>::Type *as_expression_typed(
const Mdl_function_index& index)
{
return reinterpret_cast<typename Material_function<T>::Type *>(
mdl_functions[index.z + 0].expression);
}
#endif
// BSDF functions
__device__ inline Bsdf_sample_func* as_bsdf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].bsdf_sample;
}
__device__ inline Bsdf_evaluate_func* as_bsdf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].bsdf_evaluate;
}
__device__ inline Bsdf_pdf_func* as_bsdf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].bsdf_pdf;
}
__device__ inline Bsdf_auxiliary_func* as_bsdf_auxiliary(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].bsdf_auxiliary;
}
// EDF functions
__device__ inline Edf_sample_func* as_edf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].edf_sample;
}
__device__ inline Edf_evaluate_func* as_edf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].edf_evaluate;
}
__device__ inline Edf_pdf_func* as_edf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].edf_pdf;
}
__device__ inline Edf_auxiliary_func* as_edf_auxiliary(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].edf_auxiliary;
}
// 3d vector math utilities
__device__ inline float3 operator-(const float3& a)
{
return make_float3(-a.x, -a.y, -a.z);
}
__device__ inline float3 operator+(const float3& a, const float3& b)
{
return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
__device__ inline float3 operator-(const float3& a, const float3& b)
{
return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
__device__ inline float3 operator*(const float3& a, const float3& b)
{
return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
__device__ inline float3 operator*(const float3& a, const float s)
{
return make_float3(a.x * s, a.y * s, a.z * s);
}
__device__ inline float3 operator/(const float3& a, const float s)
{
return make_float3(a.x / s, a.y / s, a.z / s);
}
__device__ inline void operator+=(float3& a, const float3& b)
{
a.x += b.x; a.y += b.y; a.z += b.z;
}
__device__ inline void operator-=(float3& a, const float3& b)
{
a.x -= b.x; a.y -= b.y; a.z -= b.z;
}
__device__ inline void operator*=(float3& a, const float3& b)
{
a.x *= b.x; a.y *= b.y; a.z *= b.z;
}
__device__ inline void operator*=(float3& a, const float& s)
{
a.x *= s; a.y *= s; a.z *= s;
}
__device__ inline float squared_length(const float3 &d)
{
return d.x * d.x + d.y * d.y + d.z * d.z;
}
__device__ inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f / sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
__device__ inline float dot(const float3 &u, const float3 &v)
{
return u.x * v.x + u.y * v.y + u.z * v.z;
}
__device__ inline float3 cross(const float3 &u, const float3 &v)
{
return make_float3(
u.y * v.z - u.z * v.y,
u.z * v.x - u.x * v.z,
u.x * v.y - u.y * v.x);
}
// Random number generator based on the OptiX SDK
template<uint32_t N>
static __forceinline__ __device__ uint32_t tea(uint32_t v0, uint32_t v1)
{
uint32_t s0 = 0;
for (uint32_t n = 0; n < N; n++)
{
s0 += 0x9e3779b9;
v0 += ((v1 << 4) + 0xa341316c) ^ (v1 + s0) ^ ((v1 >> 5) + 0xc8013ea4);
v1 += ((v0 << 4) + 0xad90777d) ^ (v0 + s0) ^ ((v0 >> 5) + 0x7e95761e);
}
return v0;
}
// Generate random uint32_t in [0, 2^24)
static __forceinline__ __device__ uint32_t lcg(uint32_t& prev)
{
const uint32_t LCG_A = 1664525u;
const uint32_t LCG_C = 1013904223u;
prev = (LCG_A * prev + LCG_C);
return prev & 0x00FFFFFF;
}
// Generate random float in [0, 1)
static __forceinline__ __device__ float rnd(uint32_t& prev)
{
return ((float)lcg(prev) / (float)0x01000000);
}
// direction to environment map texture coordinates
__device__ inline float2 environment_coords(const float3 &dir, const Kernel_params& params)
{
const float u = atan2f(dir.z, dir.x) * (float)(0.5 / M_PI) + 0.5f;
const float v = acosf(fmax(fminf(-dir.y, 1.0f), -1.0f)) * (float)(1.0 / M_PI);
return make_float2(fmodf(u + params.env_rotation * (float)(0.5 * M_1_PI), 1.0f), v);
}
// importance sample the environment
__device__ inline float3 environment_sample(
float3 &dir,
float &pdf,
const float3 &xi,
const Kernel_params &params)
{
// importance sample an envmap pixel using an alias map
const unsigned int size = params.env_size.x * params.env_size.y;
const unsigned int idx = min((unsigned int)(xi.x * (float)size), size - 1);
unsigned int env_idx;
float xi_y = xi.y;
if (xi_y < params.env_accel[idx].q) {
env_idx = idx ;
xi_y /= params.env_accel[idx].q;
} else {
env_idx = params.env_accel[idx].alias;
xi_y = (xi_y - params.env_accel[idx].q) / (1.0f - params.env_accel[idx].q);
}
const unsigned int py = env_idx / params.env_size.x;
const unsigned int px = env_idx % params.env_size.x;
pdf = params.env_accel[env_idx].pdf;
// uniformly sample spherical area of pixel
const float u = (float)(px + xi_y) / (float)params.env_size.x;
const float phi = u * (float)(2.0 * M_PI) - (float)M_PI - params.env_rotation;
float sin_phi, cos_phi;
sincosf(phi > float(-M_PI) ? phi : (phi + (float)(2.0 * M_PI)), &sin_phi, &cos_phi);
const float step_theta = (float)M_PI / (float)params.env_size.y;
const float theta0 = (float)(py) * step_theta;
const float cos_theta = cosf(theta0) * (1.0f - xi.z) + cosf(theta0 + step_theta) * xi.z;
const float theta = acosf(cos_theta);
const float sin_theta = sinf(theta);
dir = make_float3(cos_phi * sin_theta, -cos_theta, sin_phi * sin_theta);
// lookup filtered beauty
const float v = theta * (float)(1.0 / M_PI);
const float4 t = tex2D<float4>(params.env_tex, u, v);
return make_float3(t.x, t.y, t.z) * params.env_intensity / pdf;
}
// evaluate the environment
__device__ inline float3 environment_eval(
float &pdf,
const float3 &dir,
const Kernel_params &params)
{
const float2 uv = environment_coords(dir, params);
const unsigned int x =
min((unsigned int)(uv.x * (float)params.env_size.x), params.env_size.x - 1);
const unsigned int y =
min((unsigned int)(uv.y * (float)params.env_size.y), params.env_size.y - 1);
pdf = params.env_accel[y * params.env_size.x + x].pdf;
const float4 t = tex2D<float4>(params.env_tex, uv.x, uv.y) ;
return make_float3(t.x, t.y, t.z) * params.env_intensity;
}
//-------------------------------------------------------------------------------------------------
struct auxiliary_data
{
float3 albedo;
float3 normal;
int num; // multiple elements can contribute to the aux buffer with equal weight
__device__ inline auxiliary_data& operator+=(const auxiliary_data& b)
{
albedo += b.albedo;
normal += b.normal;
num += b.num;
return *this;
}
};
__device__ inline static void clear(auxiliary_data& data)
{
data.albedo = make_float3(0.0f, 0.0f, 0.0f);
data.normal = make_float3(0.0f, 0.0f, 0.0f);
data.num = 0;
}
__device__ inline void normalize(auxiliary_data& data)
{
data.albedo = data.albedo / fmaxf(1.0f, float(data.num));
if (dot(data.normal, data.normal) > 0.0f)
data.normal = normalize(data.normal);
data.num = min(1, data.num);
}
//-------------------------------------------------------------------------------------------------
struct Ray_state
{
float3 contribution;
float3 weight;
float3 pos, pos_rx, pos_ry;
float3 dir, dir_rx, dir_ry;
bool inside;
bool inside_cutout;
int intersection;
uint32_t lpe_current_state;
auxiliary_data* aux;
};
struct Ray_hit_info
{
float distance;
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 position;
#else
float3 position;
#endif
float3 normal;
float3 tangent_u;
float3 tangent_v;
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 texture_coords[1];
#else
tct_float3 texture_coords[1];
#endif
};
#define GT_SPHERE_RADIUS 1.0f
__device__ inline bool intersect_sphere(
const Ray_state &ray_state,
const Kernel_params &params,
Ray_hit_info& out_hit)
{
const float r = GT_SPHERE_RADIUS;
const float b = 2.0f * dot(ray_state.dir, ray_state.pos);
const float c = dot(ray_state.pos, ray_state.pos) - r * r;
float tmp = b * b - 4.0f * c;
if (tmp < 0.0f)
return false;
tmp = sqrtf(tmp);
const float t0 = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t1 = c / t0;
const float m = fminf(t0, t1);
out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
if (out_hit.distance < 0.0f)
return false;
// compute geometry state
#ifdef ENABLE_DERIVATIVES
out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
const float3 &posval = out_hit.position.val;
#else
out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
const float3 &posval = out_hit.position;
#endif
out_hit.normal = normalize(posval);
const float phi = atan2f(out_hit.normal.x, out_hit.normal.z);
const float theta = acosf(out_hit.normal.y);
const float3 uvw = make_float3(
(phi * (float) (0.5 / M_PI) + 0.5f) * 2.0f,
1.0f - theta * (float) (1.0 / M_PI),
0.0f);
// compute surface derivatives
float sp, cp;
sincosf(phi, &sp, &cp);
const float st = sinf(theta);
out_hit.tangent_u = make_float3(cp * st, 0.0f, -sp * st) * (float) M_PI * r;
out_hit.tangent_v = make_float3(sp * out_hit.normal.y, -st, cp * out_hit.normal.y) * (float) (-M_PI) * r;
#ifdef ENABLE_DERIVATIVES
out_hit.texture_coords[0].val = uvw;
out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
#else
out_hit.texture_coords[0] = uvw;
#endif
return true;
}
#define GT_HAIR_RADIUS 0.35f
#define GT_HAIR_LENGTH 3.0f
__device__ inline bool intersect_hair(
const Ray_state &ray_state,
const Kernel_params &params,
Ray_hit_info& out_hit)
{
const float r = GT_HAIR_RADIUS;
const float a = ray_state.dir.x * ray_state.dir.x + ray_state.dir.z * ray_state.dir.z;
const float b = 2.0f * (ray_state.dir.x * ray_state.pos.x + ray_state.dir.z * ray_state.pos.z);
const float c = ray_state.pos.x * ray_state.pos.x + ray_state.pos.z * ray_state.pos.z - r * r;
float tmp = b * b - 4.0f * a * c;
if (tmp < 0.0f)
return false;
tmp = sqrtf(tmp);
const float q = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t0 = q / a;
const float t1 = c / q;
const float m = fminf(t0, t1);
out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
if (out_hit.distance < 0.0f)
return false;
// compute geometry state
#ifdef ENABLE_DERIVATIVES
out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
const float3 &posval = out_hit.position.val;
#else
out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
const float3 &posval = out_hit.position;
#endif
out_hit.normal = normalize(make_float3(posval.x, 0.0f, posval.z));
if (fabsf(posval.y) > GT_HAIR_LENGTH * 0.5f)
return false;
const float phi = atan2f(posval.z, posval.x);
const float3 uvw = make_float3(
(posval.y + GT_HAIR_LENGTH * 0.5f) / GT_HAIR_LENGTH, // position along the hair
phi * (float) (0.5f / M_PI) + 0.5f, // position around the hair in the range [0, 1]
2.0f * GT_HAIR_RADIUS); // thickness of the hair
// compute surface derivatives
out_hit.tangent_u = make_float3(0.0, 1.0, 0.0);
out_hit.tangent_v = cross(out_hit.normal, out_hit.tangent_u);
#ifdef ENABLE_DERIVATIVES
out_hit.texture_coords[0].val = uvw;
out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
#else
out_hit.texture_coords[0] = uvw;
#endif
return true;
}
__device__ inline bool intersect_geometry(
Ray_state &ray_state,
const Kernel_params &params,
Ray_hit_info& out_hit)
{
switch (Geometry_type(params.geometry))
{
case GT_SPHERE:
if (!intersect_sphere(ray_state, params, out_hit))
return false;
break;
case GT_HAIR:
if (!intersect_hair(ray_state, params, out_hit))
return false;
break;
default:
return false;
}
#ifndef ENABLE_DERIVATIVES
ray_state.pos = out_hit.position;
#else
ray_state.pos = out_hit.position.val;
if (params.use_derivatives && ray_state.intersection == 0)
{
#ifdef USE_FAKE_DERIVATIVES
out_hit.position.dx = make_float3(1.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 1.0f, 0.0f);
out_hit.texture_coords[0].dx = make_float3(1.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 1.0f, 0.0f);
#else
// compute ray differential for one-pixel offset rays
// ("Physically Based Rendering", 3rd edition, chapter 10.1.1)
const float d = dot(out_hit.normal, ray_state.pos);
const float tx = (d - dot(out_hit.normal, ray_state.pos_rx)) / dot(out_hit.normal, ray_state.dir_rx);
const float ty = (d - dot(out_hit.normal, ray_state.pos_ry)) / dot(out_hit.normal, ray_state.dir_ry);
ray_state.pos_rx += ray_state.dir_rx * tx;
ray_state.pos_ry += ray_state.dir_ry * ty;
out_hit.position.dx = ray_state.pos_rx - ray_state.pos;
out_hit.position.dy = ray_state.pos_ry - ray_state.pos;
float4 A;
float2 B_x, B_y;
if (fabsf(out_hit.normal.x) > fabsf(out_hit.normal.y) && fabsf(out_hit.normal.x) > fabsf(out_hit.normal.z))
{
B_x = make_float2(
ray_state.pos_rx.y - ray_state.pos.y,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.y - ray_state.pos.y,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
out_hit.tangent_u.y, out_hit.tangent_u.z, out_hit.tangent_v.y, out_hit.tangent_v.z);
}
else if (fabsf(out_hit.normal.y) > fabsf(out_hit.normal.z))
{
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
out_hit.tangent_u.x, out_hit.tangent_u.z, out_hit.tangent_v.x, out_hit.tangent_v.z);
}
else
{
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.y - ray_state.pos.y);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.y - ray_state.pos.y);
A = make_float4(
out_hit.tangent_u.x, out_hit.tangent_u.y, out_hit.tangent_v.x, out_hit.tangent_v.y);
}
const float det = A.x * A.w - A.y * A.z;
if (fabsf(det) > 1e-10f)
{
const float inv_det = 1.0f / det;
out_hit.texture_coords[0].dx.x = inv_det * (A.w * B_x.x - A.z * B_x.y);
out_hit.texture_coords[0].dx.y = inv_det * (A.x * B_x.y - A.y * B_x.x);
out_hit.texture_coords[0].dy.x = inv_det * (A.w * B_y.x - A.z * B_y.y);
out_hit.texture_coords[0].dy.y = inv_det * (A.x * B_y.y - A.y * B_y.x);
}
#endif
}
#endif
out_hit.tangent_u = normalize(out_hit.tangent_u);
out_hit.tangent_v = normalize(out_hit.tangent_v);
return true;
}
__device__ bool cull_point_light(
const Kernel_params &params,
const float3 &light_position,
const float3 &light_direction /*to light*/,
const float3 &normal)
{
switch (params.geometry)
{
case GT_SPHERE:
{
// same as default, but allow lights inside the sphere
const float inside = (squared_length(light_position) < GT_SPHERE_RADIUS) ? -1.f : 1.f;
return (dot(light_direction, normal) * inside) <= 0.0f;
}
case GT_HAIR:
// ignore light sources within the volume
return (light_position.x * light_position.x +
light_position.z * light_position.z) < GT_SPHERE_RADIUS;
default:
// ignore light from behind
return dot(light_direction, normal) <= 0.0f;
}
}
__device__ bool cull_env_light(
const Kernel_params &params,
const float3 &light_direction /*to light*/,
const float3 &normal)
{
switch (params.geometry)
{
case GT_HAIR:
// allow light from behind
return false;
case GT_SPHERE:
default:
// ignore light from behind
return dot(light_direction, normal) <= 0.0f;
}
}
__device__ void continue_ray(
Ray_state& ray_state,
const Ray_hit_info &hit_indo,
unsigned int event_type,
const Kernel_params &params)
{
switch (params.geometry)
{
case GT_HAIR:
if (event_type == BSDF_EVENT_GLOSSY_TRANSMISSION)
{
// conservative
ray_state.pos += ray_state.dir * 2.0f * GT_HAIR_RADIUS;
ray_state.inside = false;
}
break;
default:
return;
}
}
//-------------------------------------------------------------------------------------------------
// events that are define a transition between states, along with tag IDs
enum Transition_type
{
TRANSITION_CAMERA = 0,
TRANSITION_LIGHT,
TRANSITION_EMISSION,
TRANSITION_SCATTER_DR,
TRANSITION_SCATTER_DT,
TRANSITION_SCATTER_GR,
TRANSITION_SCATTER_GT,
TRANSITION_SCATTER_SR,
TRANSITION_SCATTER_ST,
TRANSITION_COUNT,
};
// go to the next state, given the current state and a transition token.
__device__ inline uint32_t lpe_transition(
uint32_t current_state,
Transition_type event,
uint32_t global_tag_id,
const Kernel_params &params)
{
if(current_state == static_cast<uint32_t>(-1))
return static_cast<uint32_t>(-1);
return params.lpe_state_table[
current_state * params.lpe_num_transitions +
static_cast<uint32_t>(TRANSITION_COUNT) * global_tag_id +
static_cast<uint32_t>(event)];
}
// add direct contribution, e.g., for emission, direct light hits
__device__ inline void accumulate_contribution(
Transition_type light_event,
uint32_t light_global_tag_id,
const float3& contrib,
Ray_state &ray_state,
const Kernel_params &params)
{
// check if there is a valid transition to that light source
uint32_t next_state = lpe_transition(
ray_state.lpe_current_state, light_event, light_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// add contribution the when the reached state is a final state for the selected LPE
// here we only have one LPE buffer, but more can be added easily by checking different LPEs
if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
ray_state.contribution += contrib;
}
// add contribution for next event estimations
__device__ inline void accumulate_next_event_contribution(
Transition_type scatter_event, uint32_t material_global_tag_id,
Transition_type light_event, uint32_t light_global_tag_id,
const float3& contrib,
Ray_state &ray_state,
const Kernel_params &params)
{
// transition following the scatter event
uint32_t next_state = lpe_transition(
ray_state.lpe_current_state, scatter_event, material_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// check if there is a valid transition to the light source
next_state = lpe_transition(
next_state, light_event, light_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// add contribution the when the reached state is a final state for the selected LPE
// here we only have one LPE buffer, but more can be added easily by checking different LPEs
if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
ray_state.contribution += contrib;
}
// evaluate if certain shading point (pos) is visible to a light direction by casting a shadow ray.
__device__ inline bool trace_shadow(
const float3 &pos,
const float3 &to_light,
uint32_t &seed,
Ray_state &ray_state,
const Kernel_params &params)
{
Ray_state ray_shadow = ray_state;
ray_shadow.pos = pos;
ray_shadow.dir = normalize(to_light);
Ray_hit_info shadow_hit;
bool in_shadow = intersect_geometry(ray_shadow, params, shadow_hit);
if (in_shadow)
{
float4 texture_results[16];
// material of the current object
Df_cuda_material material = params.material_buffer[params.current_material];
Mdl_function_index func_idx;
func_idx = get_mdl_function_index(material.init);
if (!is_valid(func_idx))
return false;
// create state
Mdl_state state = {
shadow_hit.normal,
shadow_hit.normal,
shadow_hit.position,
0.0f,
shadow_hit.texture_coords,
&shadow_hit.tangent_u,
&shadow_hit.tangent_v,
texture_results,
params.tc_data[func_idx.x].ro_data_segment,
identity,
identity,
0,
1.0f
};
// access textures and other resource data
// expect that the target code index is the same for all functions of a material
Mdl_resource_handler mdl_resources;
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
// initialize the state
as_init(func_idx)(&state, &mdl_resources.data, arg_block);
// handle cutouts be treating the opacity as chance to hit the surface
// if we don't hit it, the ray will continue with the same direction
func_idx = get_mdl_function_index(material.cutout_opacity);
if (is_valid(func_idx))
{
float opacity = as_expression_typed<float>(func_idx)(&state, &mdl_resources.data, arg_block);
const float x_anyhit = rnd(seed);
in_shadow = (x_anyhit <= opacity);
}
}
return in_shadow;
}
//-------------------------------------------------------------------------------------------------
__device__ inline bool trace_scene(
uint32_t &seed,
Ray_state &ray_state,
const Kernel_params &params)
{
// stop at invalid states
if (ray_state.lpe_current_state == static_cast<uint32_t>(-1))
return false;
// intersect with geometry
Ray_hit_info hit;
if (!intersect_geometry(ray_state, params, hit)) {
if (ray_state.intersection == 0 && params.mdl_test_type != MDL_TEST_NO_ENV) {
// primary ray miss, add environment contribution
const float2 uv = environment_coords(ray_state.dir, params);
const float4 texval = tex2D<float4>(params.env_tex, uv.x, uv.y);
// add contribution, if `CL` is a valid path
accumulate_contribution(
TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
make_float3(texval.x, texval.y, texval.z) * params.env_intensity,
ray_state, params);
}
return false;
}
float4 texture_results[16];
// material of the current object
Df_cuda_material material = params.material_buffer[params.current_material];
Mdl_function_index func_idx;
func_idx = get_mdl_function_index(material.init);
if (!is_valid(func_idx))
return false;
// create state
Mdl_state state = {
hit.normal,
hit.normal,
hit.position,
0.0f,
hit.texture_coords,
&hit.tangent_u,
&hit.tangent_v,
texture_results,
params.tc_data[func_idx.x].ro_data_segment,
identity,
identity,
0,
1.0f
};
// access textures and other resource data
// expect that the target code index is the same for all functions of a material
Mdl_resource_handler mdl_resources;
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
// initialize the state
as_init(func_idx)(&state, &mdl_resources.data, arg_block);
// handle cutouts be treating the opacity as chance to hit the surface
// if we don't hit it, the ray will continue with the same direction
func_idx = get_mdl_function_index(material.cutout_opacity);
if (is_valid(func_idx))
{
float opacity = as_expression_typed<float>(func_idx)(&state, &mdl_resources.data, arg_block);
const float x_anyhit = rnd(seed);
if (x_anyhit > opacity)
{
// decrease to see the environment though front and back face cutouts
ray_state.intersection--;
// change the side
ray_state.inside_cutout = !ray_state.inside_cutout;
// avoid self-intersections
ray_state.pos += hit.normal * (ray_state.inside_cutout ? -0.001f : 0.001f);
return true;
}
}
// for evaluating parts of the BSDF individually, e.g. for implementing LPEs
// the MDL SDK provides several options to pass out the BSDF, EDF, and auxiliary data
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
// application provided memory
// the data structs will get only a pointer to a buffer, along with size and offset
const unsigned df_eval_slots = 4; // number of handles (parts) that can be evaluated
// at once. 4 is an arbitrary choice. However, it
// has to match eval_data.handle_count and
// aux_data.handle_count)
float3 result_buffer_0[df_eval_slots]; // used for bsdf_diffuse, albedo_diffue, and edf
float3 result_buffer_1[df_eval_slots]; // used for bsdf_glossy, albedo_glossy
float3 result_buffer_2[df_eval_slots]; // used for normal
float3 result_buffer_3[df_eval_slots]; // used for roughness
#elif DF_HANDLE_SLOTS == DF_HSM_NONE
// handles are ignored, all parts of the BSDF are returned at once without loops (fastest)
const unsigned df_eval_slots = 1;
#else
// eval_data and auxiliary_data have a fixed size array to pass the data. Only an offset
// is required if there are more handles (parts) than slots.
const unsigned df_eval_slots = DF_HANDLE_SLOTS;
#endif
// apply volume attenuation after first bounce
// (assuming uniform absorption coefficient and ignoring scattering coefficient)
if (ray_state.intersection > 0)
{
func_idx = get_mdl_function_index(material.volume_absorption);
if (is_valid(func_idx))
{
float3 abs_coeff = as_expression_typed<float3>(func_idx)(&state, &mdl_resources.data, arg_block);
ray_state.weight.x *= abs_coeff.x > 0.0f ? expf(-abs_coeff.x * hit.distance) : 1.0f;
ray_state.weight.y *= abs_coeff.y > 0.0f ? expf(-abs_coeff.y * hit.distance) : 1.0f;
ray_state.weight.z *= abs_coeff.z > 0.0f ? expf(-abs_coeff.z * hit.distance) : 1.0f;
}
}
// for thin_walled materials there is no 'inside'
bool thin_walled = false;
Mdl_function_index thin_walled_func_idx = get_mdl_function_index(material.thin_walled);
if (is_valid(thin_walled_func_idx))
thin_walled = as_expression_typed<bool>(thin_walled_func_idx)(
&state, &mdl_resources.data, arg_block);
// add emission
func_idx = get_mdl_function_index((thin_walled && ray_state.inside_cutout) ? material.backface_edf : material.edf);
if (is_valid(func_idx))
{
// evaluate intensity expression
float3 emission_intensity = make_float3(0.0, 0.0, 0.0);
Mdl_function_index intensity_func_idx = get_mdl_function_index(
(thin_walled && ray_state.inside_cutout) ? material.backface_emission_intensity : material.emission_intensity);
if (is_valid(intensity_func_idx))
{
emission_intensity = as_expression_typed<float3>(intensity_func_idx)(
&state, &mdl_resources.data, arg_block);
}
// evaluate EDF
Edf_evaluate_data<(Df_handle_slot_mode) DF_HANDLE_SLOTS> eval_data;
eval_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.edf = result_buffer_0;
eval_data.handle_count = df_eval_slots;
#endif
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
unsigned edf_mtag_to_gtag_map_size = (thin_walled && ray_state.inside_cutout) ?
material.backface_edf_mtag_to_gtag_map_size : material.edf_mtag_to_gtag_map_size;
unsigned* edf_mtag_to_gtag_map = (thin_walled && ray_state.inside_cutout) ?
&material.backface_edf_mtag_to_gtag_map[0] : &material.edf_mtag_to_gtag_map[0];
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < edf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials EDF
as_edf_evaluate(func_idx)(&eval_data, &state, &mdl_resources.data, arg_block);
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < edf_mtag_to_gtag_map_size); ++lobe)
{
// add emission contribution
accumulate_contribution(
TRANSITION_EMISSION, edf_mtag_to_gtag_map[offset + lobe],
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.edf * emission_intensity,
#else
eval_data.edf[lobe] * emission_intensity,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
func_idx = get_mdl_function_index(
(thin_walled && ray_state.inside_cutout) ? material.backface_bsdf : material.bsdf);
unsigned bsdf_mtag_to_gtag_map_size = (thin_walled && ray_state.inside_cutout) ?
material.backface_bsdf_mtag_to_gtag_map_size : material.bsdf_mtag_to_gtag_map_size;
unsigned* bsdf_mtag_to_gtag_map = (thin_walled && ray_state.inside_cutout) ?
&material.backface_bsdf_mtag_to_gtag_map[0] : &material.bsdf_mtag_to_gtag_map[0];
if (is_valid(func_idx))
{
// reuse memory for function data
union
{
Bsdf_sample_data sample_data;
Bsdf_evaluate_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS> eval_data;
Bsdf_pdf_data pdf_data;
Bsdf_auxiliary_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS> aux_data;
};
// initialize shared fields
if (ray_state.inside && !thin_walled)
{
sample_data.ior1.x = BSDF_USE_MATERIAL_IOR;
sample_data.ior2 = make_float3(1.0f, 1.0f, 1.0f);
}
else
{
sample_data.ior1 = make_float3(1.0f, 1.0f, 1.0f);
sample_data.ior2.x = BSDF_USE_MATERIAL_IOR;
}
sample_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
// if requested, fill auxiliary buffers
if (params.enable_auxiliary_output && ray_state.intersection == 0)
{
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
aux_data.albedo_diffuse = result_buffer_0;
aux_data.albedo_glossy = result_buffer_1;
aux_data.normal = result_buffer_2;
aux_data.roughness = result_buffer_3;
aux_data.handle_count = df_eval_slots;
#endif
aux_data.flags = params.bsdf_data_flags;
// outer loop in case there are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
aux_data.handle_offset = offset;
#endif
// evaluate the materials auxiliary
as_bsdf_auxiliary(func_idx)(&aux_data, &state, &mdl_resources.data, arg_block);
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
{
// to keep it simpler, the individual albedo and normals are averaged
// however, the parts can also be used separately, e.g. for LPEs
#if DF_HANDLE_SLOTS == DF_HSM_NONE
ray_state.aux->albedo += aux_data.albedo_diffuse + aux_data.albedo_glossy;
ray_state.aux->normal += aux_data.normal;
#else
ray_state.aux->albedo += aux_data.albedo_diffuse[lobe] +
aux_data.albedo_glossy[lobe];
ray_state.aux->normal += aux_data.normal[lobe];
#endif
ray_state.aux->num++;
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
// compute direct lighting for point light
Transition_type transition_glossy, transition_diffuse;
if (params.light_intensity > 0.0f)
{
float3 to_light = params.light_pos - ray_state.pos;
bool culled_light = cull_point_light(params, params.light_pos, to_light, ray_state.inside_cutout ? -hit.normal : hit.normal);
// cast a shadow ray when inside a cutout
if (ray_state.inside_cutout && !culled_light)
{
const float3 pos = ray_state.pos - hit.normal * 0.001f;
culled_light = trace_shadow(pos, to_light, seed, ray_state, params);
}
if(!culled_light)
{
const float inv_squared_dist = 1.0f / squared_length(to_light);
const float3 f = params.light_color * params.light_intensity *
inv_squared_dist * (float) (0.25 / M_PI);
eval_data.k2 = to_light * sqrtf(inv_squared_dist);
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.bsdf_diffuse = result_buffer_0;
eval_data.bsdf_glossy = result_buffer_1;
eval_data.handle_count = df_eval_slots;
#endif
eval_data.flags = params.bsdf_data_flags;
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials BSDF
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, arg_block);
// we know if we reflect or transmit
if (dot(to_light, ray_state.inside_cutout ? -hit.normal : hit.normal) > 0.0f) {
transition_glossy = TRANSITION_SCATTER_GR;
transition_diffuse = TRANSITION_SCATTER_DR;
} else {
transition_glossy = TRANSITION_SCATTER_GT;
transition_diffuse = TRANSITION_SCATTER_DT;
}
// sample weight
const float3 w = ray_state.weight * f;
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
{
// get the `global tag` of the lobe
unsigned material_lobe_gtag = bsdf_mtag_to_gtag_map[offset + lobe];
// add diffuse contribution
accumulate_next_event_contribution(
transition_diffuse, material_lobe_gtag,
TRANSITION_LIGHT, params.point_light_gtag, // light group
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_diffuse * w,
#else
eval_data.bsdf_diffuse[lobe] * w,
#endif
ray_state, params);
// add glossy contribution
accumulate_next_event_contribution(
transition_glossy, material_lobe_gtag,
TRANSITION_LIGHT, params.point_light_gtag, // light group
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_glossy * w,
#else
eval_data.bsdf_glossy[lobe] * w,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
}
// importance sample environment light
if (params.mdl_test_type != MDL_TEST_SAMPLE && params.mdl_test_type != MDL_TEST_NO_ENV)
{
const float xi0 = rnd(seed);
const float xi1 = rnd(seed);
const float xi2 = rnd(seed);
float3 light_dir;
float pdf;
const float3 f = environment_sample(light_dir, pdf, make_float3(xi0, xi1, xi2), params);
bool culled_env_light = pdf < 0.0f || cull_env_light(params, light_dir, ray_state.inside_cutout ? -hit.normal : hit.normal);
// cast a shadow ray when inside a cutout
if(ray_state.inside_cutout && !culled_env_light)
{
const float3 pos = ray_state.pos - hit.normal * 0.001f;
culled_env_light = trace_shadow(pos, light_dir, seed, ray_state, params);
}
if (!culled_env_light)
{
eval_data.k2 = light_dir;
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.bsdf_diffuse = result_buffer_0;
eval_data.bsdf_glossy = result_buffer_1;
eval_data.handle_count = df_eval_slots;
#endif
eval_data.flags = params.bsdf_data_flags;
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials BSDF
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, arg_block);
const float mis_weight =
(params.mdl_test_type == MDL_TEST_EVAL) ? 1.0f : pdf / (pdf + eval_data.pdf);
// we know if we reflect or transmit
if (dot(light_dir, ray_state.inside_cutout ? -hit.normal : hit.normal) > 0.0f) {
transition_glossy = TRANSITION_SCATTER_GR;
transition_diffuse = TRANSITION_SCATTER_DR;
} else {
transition_glossy = TRANSITION_SCATTER_GT;
transition_diffuse = TRANSITION_SCATTER_DT;
}
// sample weight
const float3 w = ray_state.weight * f * mis_weight;
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
{
// get the `global tag` of the lobe
unsigned material_lobe_gtag = bsdf_mtag_to_gtag_map[offset + lobe];
// add diffuse contribution
accumulate_next_event_contribution(
transition_diffuse, material_lobe_gtag,
TRANSITION_LIGHT, params.env_gtag, // light group 'env'
#if DF_HANDLE_SLOTS == DF_HSM_NONE
(eval_data.bsdf - eval_data.bsdf_glossy) * w,
#else
eval_data.bsdf_diffuse[lobe] * w,
#endif
ray_state, params);
// add glossy contribution
accumulate_next_event_contribution(
transition_glossy, material_lobe_gtag,
TRANSITION_LIGHT, params.env_gtag, // light group 'env'
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_glossy * w,
#else
eval_data.bsdf_glossy[lobe] * w,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
}
// importance sample BSDF
{
sample_data.xi.x = rnd(seed);
sample_data.xi.y = rnd(seed);
sample_data.xi.z = rnd(seed);
sample_data.xi.w = rnd(seed);
sample_data.flags = params.bsdf_data_flags;
// sample the materials BSDF
as_bsdf_sample(func_idx)(&sample_data, &state, &mdl_resources.data, arg_block);
if (sample_data.event_type == BSDF_EVENT_ABSORB)
return false;
ray_state.dir = sample_data.k2;
ray_state.weight *= sample_data.bsdf_over_pdf;
const bool transmission = (sample_data.event_type & BSDF_EVENT_TRANSMISSION) != 0;
if (transmission)
ray_state.inside = !ray_state.inside;
const bool is_specular = (sample_data.event_type & BSDF_EVENT_SPECULAR) != 0;
Transition_type next;
if (is_specular)
next = transmission ? TRANSITION_SCATTER_ST : TRANSITION_SCATTER_SR;
else if ((sample_data.event_type & BSDF_EVENT_DIFFUSE) != 0)
next = transmission ? TRANSITION_SCATTER_DT : TRANSITION_SCATTER_DR;
else
next = transmission ? TRANSITION_SCATTER_GT : TRANSITION_SCATTER_GR;
// move ray to the next sampled state
ray_state.lpe_current_state = lpe_transition(
ray_state.lpe_current_state, next,
#if DF_HANDLE_SLOTS == DF_HSM_NONE
// ill-defined case: the LPE machine expects tags but the renderer ignores them
// -> the resulting image of LPEs with tags is undefined in this case
params.default_gtag,
#else
bsdf_mtag_to_gtag_map[sample_data.handle], // sampled lobe
#endif
params);
// depending on the geometry, the ray might be displaced before continuing
continue_ray(ray_state, hit, sample_data.event_type, params);
if (ray_state.inside || ray_state.inside_cutout)
{
// avoid self-intersections
ray_state.pos -= hit.normal * 0.001f;
return true; // continue bouncing in sphere
}
else if (params.mdl_test_type != MDL_TEST_NO_ENV &&
params.mdl_test_type != MDL_TEST_EVAL)
{
// leaving sphere, add contribution from environment hit
float pdf;
const float3 f = environment_eval(pdf, sample_data.k2, params);
float bsdf_pdf;
if (params.mdl_test_type == MDL_TEST_MIS_PDF)
{
const float3 k2 = sample_data.k2;
pdf_data.k2 = k2;
pdf_data.flags = params.bsdf_data_flags;
// get pdf corresponding to the materials BSDF
as_bsdf_pdf(func_idx)(&pdf_data, &state, &mdl_resources.data, arg_block);
bsdf_pdf = pdf_data.pdf;
}
else
bsdf_pdf = sample_data.pdf;
if (is_specular || bsdf_pdf > 0.0f)
{
const float mis_weight = is_specular ||
(params.mdl_test_type == MDL_TEST_SAMPLE) ? 1.0f :
bsdf_pdf / (pdf + bsdf_pdf);
float3 specular_contrib = ray_state.weight * f * mis_weight;
accumulate_contribution(
TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
specular_contrib,
ray_state, params);
}
}
}
}
return false;
}
struct render_result
{
float3 beauty;
auxiliary_data aux;
};
__device__ inline render_result render_scene(
uint32_t &seed,
const Kernel_params &params,
const unsigned x,
const unsigned y)
{
const float inv_res_x = 1.0f / (float)params.resolution.x;
const float inv_res_y = 1.0f / (float)params.resolution.y;
const float dx = params.disable_aa ? 0.5f : rnd(seed);
const float dy = params.disable_aa ? 0.5f : rnd(seed);
const float2 screen_pos = make_float2(
((float)x + dx) * inv_res_x,
((float)y + dy) * inv_res_y);
const float r = (2.0f * screen_pos.x - 1.0f);
const float r_rx = (2.0f * (screen_pos.x + inv_res_x) - 1.0f);
const float u = (2.0f * screen_pos.y - 1.0f);
const float u_ry = (2.0f * (screen_pos.y + inv_res_y) - 1.0f);
const float aspect = (float)params.resolution.y / (float)params.resolution.x;
render_result res;
clear(res.aux);
Ray_state ray_state;
ray_state.contribution = make_float3(0.0f, 0.0f, 0.0f);
ray_state.weight = make_float3(1.0f, 1.0f, 1.0f);
ray_state.pos = ray_state.pos_rx = ray_state.pos_ry = params.cam_pos;
ray_state.dir = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u);
ray_state.dir_rx = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r_rx + params.cam_up * aspect * u);
ray_state.dir_ry = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u_ry);
ray_state.inside = false;
ray_state.inside_cutout = false;
ray_state.lpe_current_state = 1; // already at the camera so state 0 to 1 is free as long as
// there is only one camera
ray_state.aux = &res.aux;
const unsigned int max_inters = params.max_path_length - 1;
for (ray_state.intersection = 0; ray_state.intersection < max_inters; ++ray_state.intersection)
{
if (!trace_scene(seed, ray_state, params))
break;
}
res.beauty =
isfinite(ray_state.contribution.x) &&
isfinite(ray_state.contribution.y) &&
isfinite(ray_state.contribution.z) ? ray_state.contribution : make_float3(0.0f, 0.0f, 0.0f);
normalize(res.aux);
return res;
}
// quantize + gamma
__device__ inline unsigned int float3_to_rgba8(float3 val)
{
const unsigned int r = (unsigned int) (255.0f * powf(__saturatef(val.x), 1.0f / 2.2f));
const unsigned int g = (unsigned int) (255.0f * powf(__saturatef(val.y), 1.0f / 2.2f));
const unsigned int b = (unsigned int) (255.0f * powf(__saturatef(val.z), 1.0f / 2.2f));
return 0xff000000 | (b << 16) | (g << 8) | r;
}
// exposure + simple Reinhard tonemapper + gamma
__device__ inline unsigned int display(float3 val, const float tonemap_scale)
{
val *= tonemap_scale;
const float burn_out = 0.1f;
val.x *= (1.0f + val.x * burn_out) / (1.0f + val.x);
val.y *= (1.0f + val.y * burn_out) / (1.0f + val.y);
val.z *= (1.0f + val.z * burn_out) / (1.0f + val.z);
return float3_to_rgba8(val);
}
// CUDA kernel rendering simple geometry with IBL
extern "C" __global__ void render_scene_kernel(
const Kernel_params kernel_params)
{
const unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= kernel_params.resolution.x || y >= kernel_params.resolution.y)
return;
const unsigned int idx = y * kernel_params.resolution.x + x;
uint32_t seed = tea<4>(idx, kernel_params.iteration_start);
render_result res;
float3 beauty = make_float3(0.0f, 0.0f, 0.0f);
auxiliary_data aux;
clear(aux);
for (unsigned int s = 0; s < kernel_params.iteration_num; ++s)
{
res = render_scene(
seed,
kernel_params,
x, y);
beauty += res.beauty;
aux += res.aux;
}
beauty *= 1.0f / (float)kernel_params.iteration_num;
normalize(aux);
// accumulate
if (kernel_params.iteration_start == 0) {
kernel_params.accum_buffer[idx] = beauty;
if (kernel_params.enable_auxiliary_output) {
kernel_params.albedo_buffer[idx] = aux.albedo;
kernel_params.normal_buffer[idx] = aux.normal;
}
} else {
float iteration_weight = (float) kernel_params.iteration_num /
(float) (kernel_params.iteration_start + kernel_params.iteration_num);
float3 buffer_val = kernel_params.accum_buffer[idx] +
(beauty - kernel_params.accum_buffer[idx]) * iteration_weight;
kernel_params.accum_buffer[idx] =
(isinf(buffer_val.x) || isnan(buffer_val.y) || isinf(buffer_val.z) ||
isnan(buffer_val.x) || isinf(buffer_val.y) || isnan(buffer_val.z))
? make_float3(0.0f, 0.0f, 1.0e+30f)
: buffer_val;
if (kernel_params.enable_auxiliary_output) {
// albedo
kernel_params.albedo_buffer[idx] = kernel_params.albedo_buffer[idx] +
(aux.albedo - kernel_params.albedo_buffer[idx]) * iteration_weight;
// normal, check for zero length first
float3 weighted_normal = kernel_params.normal_buffer[idx] +
(aux.normal - kernel_params.normal_buffer[idx]) * iteration_weight;
if (dot(weighted_normal, weighted_normal) > 0.0f)
weighted_normal = normalize(weighted_normal);
kernel_params.normal_buffer[idx] = weighted_normal;
}
}
// update display buffer
if (kernel_params.display_buffer)
{
switch (kernel_params.enable_auxiliary_output ? kernel_params.display_buffer_index : 0)
{
case 1: /* albedo */
kernel_params.display_buffer[idx] = float3_to_rgba8(kernel_params.albedo_buffer[idx]);
break;
case 2: /* normal */
{
float3 display_normal = kernel_params.normal_buffer[idx];
if (dot(display_normal, display_normal) > 0) {
display_normal.x = display_normal.x * 0.5f + 0.5f;
display_normal.y = display_normal.y * 0.5f + 0.5f;
display_normal.z = display_normal.z * 0.5f + 0.5f;
}
kernel_params.display_buffer[idx] = float3_to_rgba8(display_normal);
break;
}
default: /* beauty */
kernel_params.display_buffer[idx] =
display(kernel_params.accum_buffer[idx], kernel_params.exposure_scale);
break;
}
}
}
[Previous] [Up] [Next]