Material Definition Language API nvidia_logo_transpbg.gif Up
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups Pages
Example for Compiled Distribution Functions (PTX)
[Previous] [Up] [Next]

This example shows how functions implementing BSDFs can be generated by the "PTX" backend. To illustrate the use of these functions in CUDA, it implements a small physically based renderer that computes direct lighting from an HDR environment map using BSDF evaluation and importance sampling.

New Topics

  • Generated BSDF functions
  • Changing compiled material arguments at runtime

Detailed Description

BSDF functions


The generated functions for BSDFs implement

  • An initialization function to prepare shared data per hit.
  • Evaluation of the BSDF for a given pair of incoming and outgoing directions.
  • Importance sampling of an incoming direction given an outgoing direction.
  • Probability density function (PDF) computation of generating an incoming for a given outgoing direction.

Each function takes arguments for state, resources, and material arguments that are analogous to the material expression counterpart of the earlier examples and the latter three functions further take a specific input and output parameter data. The signatures of the functions look like this:

typedef void (Bsdf_init_function) (Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
typedef void (Bsdf_sample_function) (Bsdf_sample_data *data,
const Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
typedef void (Bsdf_evaluate_function)(Bsdf_evaluate_data *data,
const Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
typedef void (Bsdf_pdf_function) (Bsdf_pdf_data *data,
const Shading_state_material *state,
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);

The functions can be generated by mi::neuraylib::IMdl_backend::translate_material_df() or mi::neuraylib::ILink_unit::add_material_df(). Both functions have a parameter include_geometry_normal that can be specified to make the initialization function replace state->normal by the result of the expression connected to geometry.normal of the material. The final function names are specified by a base name that will suffixed by _init, _sample, _evaluate, and _pdf.

BSDF evaluation and PDF computation take a pair of directions and IORs (index of refraction) as input and produce a PDF and, in the case of Bsdf_evaluate_function, the value of the BSDF. The corresponding structs are

struct Bsdf_evaluate_data {
// Input fields
float3 ior1; // IOR current medium
float3 ior2; // IOR other side
float3 k1; // outgoing direction
float3 k2; // incoming direction
// Output fields
float3 bsdf; // bsdf * dot(normal, k2)
float pdf; // pdf (non-projected hemisphere)
};
struct Bsdf_pdf_data {
// Input fields
float3 ior1; // IOR current medium
float3 ior2; // IOR other side
float3 k1; // outgoing direction
float3 k2; // incoming direction
// Output fields
float pdf; // pdf (non-projected hemisphere)
};

The handling of the (color) IOR is generally up to the renderer, in particular this includes the decision if the evaluation should happen from within an object with refracting material or from the outer side. It is possible though, to just pass in one IOR (e.g. ior1) and make the functions fill the other with the IOR of the material by marking the x component of it with MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR (e.g. ior2.x=MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR).

For importance sampling, one direction and a set of pseudo-random numbers needs to be provided. The output values include the importance sampled direction, a PDF, a weight (the value of the BSDF divided by the PDF), and the type of the BSDF that was eventually sampled. Note that this may include an absorption event, in which case no direction has been generated.

BSDF_EVENT_ABSORB = 0,
BSDF_EVENT_DIFFUSE = 1,
BSDF_EVENT_GLOSSY = 1 << 1,
BSDF_EVENT_SPECULAR = 1 << 2,
BSDF_EVENT_REFLECTION = 1 << 3,
BSDF_EVENT_TRANSMISSION = 1 << 4,
BSDF_EVENT_DIFFUSE_REFLECTION = BSDF_EVENT_DIFFUSE | BSDF_EVENT_REFLECTION,
BSDF_EVENT_DIFFUSE_TRANSMISSION = BSDF_EVENT_DIFFUSE | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_GLOSSY_REFLECTION = BSDF_EVENT_GLOSSY | BSDF_EVENT_REFLECTION,
BSDF_EVENT_GLOSSY_TRANSMISSION = BSDF_EVENT_GLOSSY | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_SPECULAR_REFLECTION = BSDF_EVENT_SPECULAR | BSDF_EVENT_REFLECTION,
BSDF_EVENT_SPECULAR_TRANSMISSION = BSDF_EVENT_SPECULAR | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_FORCE_32_BIT = 0xffffffffU
};
struct Bsdf_sample_data {
// Input fields
float3 ior1; // IOR current medium
float3 ior2; // IOR other side
float3 k1; // outgoing direction
float3 xi; // pseudo-random sample number
// Output fields
float3 k2; // incoming direction
float pdf; // pdf (non-projected hemisphere)
float3 bsdf_over_pdf; // bsdf * dot(normal, k2) / pdf
Bsdf_event_type event_type; // the type of event for the generated sample
};

It often is the case that for a given shading point both evaluation and importance sampling need to be performed (possibly multiple times). To avoid re-computation of material expressions in each BSDF function call it is essential that the results are cached between multiple calls. This is accomplished by the initialization function Bsdf_init_function which stores computed results in an array passed by the mi::neuraylib::Shading_state_material::text_results field. The size of that array needs to be communicated to the backend using the "num_texture_results" option via mi::neuraylib::IMdl_backend::set_option(). If the storage is insufficient (e.g. for a material with a high number of material expressions), non-cached expressions are automatically recomputed when they are needed. The initialization is generally not optional, i.e. even if the text_results array size is set to zero it may still perform some initialization, in particular it will update state->normal if requested.

EDF functions


Analogous to the generation of BSDF functions, emission distribution functions (EDFs) can be generated using the corresponding signatures and data structures. For more information see the example code and run the example application using materials that contain EDFs:

df_cuda ::nvidia::sdk_examples::tutorials::example_edf
df_cuda ::nvidia::sdk_examples::tutorials::example_measured_edf

Global distributions are not supported yet and result in no emission. Hence, the generated EDFs can currently be evaluated only in tangent space.

Changing arguments of class-compiled materials at runtime


As explained in Instance-compilation and class-compilation, the resulting mi::neuraylib::ITarget_code object contains mi::neuraylib::ITarget_value_layout and mi::neuraylib::ITarget_argument_block objects for each material, when class-compiled materials are used for generating target code. Together with the corresponding mi::neuraylib::ICompiled_material, you can already get some information about the arguments:

The example uses this information to build a material editor GUI with the "Dear ImGui" framework (https://github.com/ocornut/imgui). The GUI controls are linked to the data of the target argument block using the offsets of the arguments. When ImGui reports any changes by the user, the target argument block is updated on the device for the current material.

When presenting material arguments to the user, additional information from parameter annotations may improve the user experience. To find the annotations for an argument of a compiled material, you have to look up the annotation block for a parameter of the corresponding mi::neuraylib::IMaterial_definition with the same name as the argument.

Note
You will only find a parameter in a material definition with an exactly matching name for a compiled material argument, when non-struct constants were used as arguments during class-compilation. For other arguments you will get paths like "x.b" as parameter names.

In this example, the anno::hard_range annotation is used to determine the minimum and maximum values for value sliders, the anno::display_name annotation is used as a more user-friendly name for the arguments, and the anno::in_group annotation is used to group the arguments into categories.

Example Source

To compile the source code, you require CUDA, GLFW, and GLEW. For detailed instructions, please refer to the Getting Started section.

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cpp

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/df_cuda/example_df_cuda.cpp
//
// Simple renderer using compiled BSDFs with a material parameter editor GUI.
#include <chrono>
#include <iostream>
#include <string>
#include <vector>
#include <list>
#include <map>
#include <memory>
#define _USE_MATH_DEFINES
#include <math.h>
// shared example helpers
#include "example_df_cuda.h"
#include "lpe.h"
// Enable this to dump the generated PTX code to stdout.
// #define DUMP_PTX
#define OPENGL_INTEROP
#include "example_cuda_shared.h"
#include <imgui.h>
#include <imgui_impl_glfw.h>
#include <imgui_impl_opengl3.h>
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#define GL_DISPLAY_CUDA
#include "utils/gl_display.h"
#define terminate() \
do { \
glfwTerminate(); \
exit_failure(); \
} while (0)
#define WINDOW_TITLE "MDL SDK DF CUDA Example"
// Vector helper functions //
inline float length(const float3 &d)
{
return sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
}
inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f / length(d);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
inline float3 operator/(const float3& d, float s)
{
const float inv_s = 1.0f / s;
return make_float3(d.x * inv_s, d.y * inv_s, d.z * inv_s);
}
// OpenGL code //
// Initialize OpenGL and create a window with an associated OpenGL context.
static GLFWwindow *init_opengl(std::string& version_string, int res_x, int res_y)
{
// Initialize GLFW
check_success(glfwInit());
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
version_string = "#version 330 core"; // see top comments in 'imgui_impl_opengl3.cpp'
// Create an OpenGL window and a context
GLFWwindow *window = glfwCreateWindow(
res_x, res_y, WINDOW_TITLE, nullptr, nullptr);
if (!window) {
std::cerr << "Error creating OpenGL window!" << std::endl;
terminate();
}
// Attach context to window
glfwMakeContextCurrent(window);
// Initialize GLEW to get OpenGL extensions
GLenum res = glewInit();
if (res != GLEW_OK) {
std::cerr << "GLEW error: " << glewGetErrorString(res) << std::endl;
terminate();
}
// Disable VSync
glfwSwapInterval(0);
check_success(glGetError() == GL_NO_ERROR);
return window;
}
// Application logic //
// Context structure for window callback functions.
struct Window_context
{
bool mouse_event, key_event;
bool save_image;
int zoom;
int mouse_button; // button from callback event plus one (0 = no event)
int mouse_button_action; // action from mouse button callback event
int mouse_wheel_delta;
bool moving;
double move_start_x, move_start_y;
double move_dx, move_dy;
int material_index_delta;
bool save_result;
bool exposure_event;
float exposure;
};
static std::string to_string(Display_buffer_options option)
{
switch (option)
{
case DISPLAY_BUFFER_LPE: return "Selected LPE";
case DISPLAY_BUFFER_ALBEDO: return "Albedo";
case DISPLAY_BUFFER_NORMAL: return "Normal";
default: return "";
}
}
// GLFW scroll callback
static void handle_scroll(GLFWwindow *window, double xoffset, double yoffset)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (yoffset > 0.0) {
ctx->mouse_wheel_delta = 1; ctx->mouse_event = true;
} else if (yoffset < 0.0) {
ctx->mouse_wheel_delta = -1; ctx->mouse_event = true;
}
ImGui_ImplGlfw_ScrollCallback(window, xoffset, yoffset);
}
// GLFW keyboard callback
static void handle_key(GLFWwindow *window, int key, int scancode, int action, int mods)
{
// Handle key press events
if (action == GLFW_PRESS) {
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
switch (key) {
// Escape closes the window
case GLFW_KEY_ESCAPE:
glfwSetWindowShouldClose(window, GLFW_TRUE);
break;
case GLFW_KEY_DOWN:
case GLFW_KEY_RIGHT:
case GLFW_KEY_PAGE_DOWN:
ctx->material_index_delta = 1;
ctx->key_event = true;
break;
case GLFW_KEY_UP:
case GLFW_KEY_LEFT:
case GLFW_KEY_PAGE_UP:
ctx->material_index_delta = -1;
ctx->key_event = true;
break;
case GLFW_KEY_ENTER:
ctx->save_result = true;
break;
case GLFW_KEY_KP_SUBTRACT:
ctx->exposure--;
ctx->exposure_event = true;
break;
case GLFW_KEY_KP_ADD:
ctx->exposure++;
ctx->exposure_event = true;
break;
default:
break;
}
}
ImGui_ImplGlfw_KeyCallback(window, key, scancode, action, mods);
}
// GLFW mouse button callback
static void handle_mouse_button(GLFWwindow *window, int button, int action, int mods)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
ctx->mouse_button = button + 1;
ctx->mouse_button_action = action;
ImGui_ImplGlfw_MouseButtonCallback(window, button, action, mods);
}
// GLFW mouse position callback
static void handle_mouse_pos(GLFWwindow *window, double xpos, double ypos)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->moving)
{
ctx->move_dx += xpos - ctx->move_start_x;
ctx->move_dy += ypos - ctx->move_start_y;
ctx->move_start_x = xpos;
ctx->move_start_y = ypos;
ctx->mouse_event = true;
}
}
// Resize CUDA buffers for a given resolution
static void resize_buffers(CUdeviceptr *buffer_cuda, int width, int height)
{
// Allocate CUDA buffer
if (*buffer_cuda)
check_cuda_success(cuMemFree(*buffer_cuda));
if (width == 0 || height == 0)
*buffer_cuda = 0;
else
check_cuda_success(cuMemAlloc(buffer_cuda, width * height * sizeof(float3)));
}
// Helper for create_environment()
static float build_alias_map(
const float *data,
const unsigned int size,
Env_accel *accel)
{
// create qs (normalized)
float sum = 0.0f;
for (unsigned int i = 0; i < size; ++i)
sum += data[i];
for (unsigned int i = 0; i < size; ++i)
accel[i].q = (static_cast<float>(size) * data[i] / sum);
// create partition table
unsigned int *partition_table = static_cast<unsigned int *>(
malloc(size * sizeof(unsigned int)));
unsigned int s = 0u, large = size;
for (unsigned int i = 0; i < size; ++i)
partition_table[(accel[i].q < 1.0f) ? (s++) : (--large)] = accel[i].alias = i;
// create alias map
for (s = 0; s < large && large < size; ++s)
{
const unsigned int j = partition_table[s], k = partition_table[large];
accel[j].alias = k;
accel[k].q += accel[j].q - 1.0f;
large = (accel[k].q < 1.0f) ? (large + 1u) : large;
}
free(partition_table);
return sum;
}
// Create environment map texture and acceleration data for importance sampling
static void create_environment(
cudaTextureObject_t *env_tex,
cudaArray_t *env_tex_data,
CUdeviceptr *env_accel,
uint2 *res,
const char *envmap_name)
{
// Load environment texture
transaction->create<mi::neuraylib::IImage>("Image"));
check_success(image->reset_file(envmap_name) == 0);
const mi::Uint32 rx = canvas->get_resolution_x();
const mi::Uint32 ry = canvas->get_resolution_y();
res->x = rx;
res->y = ry;
// Check, whether we need to convert the image
char const *image_type = image->get_type();
if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0)
canvas = image_api->convert(canvas.get(), "Color");
// Copy the image data to a CUDA array
const cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
check_cuda_success(cudaMallocArray(env_tex_data, &channel_desc, rx, ry));
const float *pixels = static_cast<const float *>(tile->get_data());
check_cuda_success(cudaMemcpy2DToArray(
*env_tex_data, 0, 0, pixels,
rx * sizeof(float4), rx * sizeof(float4), ry, cudaMemcpyHostToDevice));
// Create a CUDA texture
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = *env_tex_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeWrap;
tex_desc.addressMode[1] = cudaAddressModeClamp; // don't sample beyond poles of env sphere
tex_desc.addressMode[2] = cudaAddressModeWrap;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
check_cuda_success(cudaCreateTextureObject(env_tex, &res_desc, &tex_desc, nullptr));
// Create importance sampling data
Env_accel *env_accel_host = static_cast<Env_accel *>(malloc(rx * ry * sizeof(Env_accel)));
float *importance_data = static_cast<float *>(malloc(rx * ry * sizeof(float)));
float cos_theta0 = 1.0f;
const float step_phi = float(2.0 * M_PI) / float(rx);
const float step_theta = float(M_PI) / float(ry);
for (unsigned int y = 0; y < ry; ++y)
{
const float theta1 = float(y + 1) * step_theta;
const float cos_theta1 = std::cos(theta1);
const float area = (cos_theta0 - cos_theta1) * step_phi;
cos_theta0 = cos_theta1;
for (unsigned int x = 0; x < rx; ++x) {
const unsigned int idx = y * rx + x;
const unsigned int idx4 = idx * 4;
importance_data[idx] =
area * std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2]));
}
}
const float inv_env_integral = 1.0f / build_alias_map(importance_data, rx * ry, env_accel_host);
free(importance_data);
for (unsigned int i = 0; i < rx * ry; ++i) {
const unsigned int idx4 = i * 4;
env_accel_host[i].pdf =
std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2])) * inv_env_integral;
}
*env_accel = gpu_mem_dup(env_accel_host, rx * ry * sizeof(Env_accel));
free(env_accel_host);
}
static void upload_lpe_state_machine(
Kernel_params& kernel_params,
LPE_state_machine& lpe_state_machine)
{
uint32_t num_trans = lpe_state_machine.get_transition_count();
uint32_t num_states = lpe_state_machine.get_state_count();
kernel_params.lpe_num_transitions = num_trans;
kernel_params.lpe_num_states = num_states;
// free old data
if (kernel_params.lpe_state_table)
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
if (kernel_params.lpe_final_mask)
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
// state table
CUdeviceptr state_table = 0;
check_cuda_success(cuMemAlloc(&state_table, num_states * num_trans * sizeof(uint32_t)));
check_cuda_success(cuMemcpyHtoD(state_table, lpe_state_machine.get_state_table().data(),
num_states * num_trans * sizeof(uint32_t)));
kernel_params.lpe_state_table = reinterpret_cast<uint32_t*>(state_table);
// final state masks
CUdeviceptr final_mask = 0;
check_cuda_success(cuMemAlloc(&final_mask, num_states * sizeof(uint32_t)));
check_cuda_success(cuMemcpyHtoD(final_mask, lpe_state_machine.get_final_state_masks().data(),
num_states * sizeof(uint32_t)));
kernel_params.lpe_final_mask = reinterpret_cast<uint32_t*>(final_mask);
// tag ID for light sources as they don't store tags in this examples
kernel_params.default_gtag = lpe_state_machine.handle_to_global_tag("");
kernel_params.point_light_gtag = lpe_state_machine.handle_to_global_tag("point_light");
kernel_params.env_gtag = lpe_state_machine.handle_to_global_tag("env");
}
// Save current result image to disk
static void save_result(
const CUdeviceptr cuda_buffer,
const unsigned int width,
const unsigned int height,
const std::string &filename,
{
image_api->create_canvas("Rgb_fp", width, height));
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(data, cuda_buffer, width * height * sizeof(float3)));
// assuming EXR and HDR are the only linear color space formats we are using
// other formats are stored in sRGB (approximated by gamma 2.2)
if (!mi::examples::strings::ends_with(filename, ".exr") &&
!mi::examples::strings::ends_with(filename, ".hdr"))
image_api->adjust_gamma(canvas.get(), 2.2f);
mdl_impexp_api->export_canvas(filename.c_str(), canvas.get());
}
// Application options
struct Options {
int cuda_device;
float gui_scale;
bool opengl;
bool use_class_compilation;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter;
#endif
bool no_aa;
bool enable_derivatives;
bool fold_ternary_on_df;
bool enable_auxiliary_output;
unsigned int res_x, res_y;
unsigned int iterations;
unsigned int samples_per_iteration;
unsigned int mdl_test_type;
unsigned int max_path_length;
float fov;
float exposure;
float3 cam_pos;
float3 light_pos;
float3 light_intensity;
std::string hdrfile;
float hdr_rot;
std::string outputfile;
std::vector<std::string> material_names;
// Default constructor, sets default values.
Options()
: cuda_device(0)
, gui_scale(1.0f)
, opengl(true)
, use_class_compilation(true)
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
, use_df_interpreter(false)
#endif
, no_aa(false)
, enable_derivatives(false)
, fold_ternary_on_df(false)
, enable_auxiliary_output(true)
, res_x(1024)
, res_y(1024)
, iterations(4096)
, samples_per_iteration(8)
, mdl_test_type(MDL_TEST_MIS)
, max_path_length(4)
, fov(96.0f)
, exposure(0.0f)
, cam_pos(make_float3(0, 0, 3))
, light_pos(make_float3(10, 0, 5))
, light_intensity(make_float3(0, 0, 0))
, hdrfile("nvidia/sdk_examples/resources/environment.hdr")
, hdr_rot(0.0f)
, outputfile("output.exr")
, material_names()
{}
};
// Possible enum values if any.
struct Enum_value {
std::string name;
int value;
Enum_value(const std::string &name, int value)
: name(name), value(value)
{
}
};
// Info for an enum type.
struct Enum_type_info {
std::vector<Enum_value> values;
// Adds a enum value and its integer value to the enum type info.
void add(const std::string &name, int value) {
values.push_back(Enum_value(name, value));
}
};
// Material parameter information structure.
class Param_info
{
public:
enum Param_kind
{
PK_UNKNOWN,
PK_FLOAT,
PK_FLOAT2,
PK_FLOAT3,
PK_COLOR,
PK_ARRAY,
PK_BOOL,
PK_INT,
PK_ENUM,
PK_STRING,
PK_TEXTURE,
PK_LIGHT_PROFILE,
PK_BSDF_MEASUREMENT
};
Param_info(
mi::Size index,
char const *name,
char const *display_name,
char const *group_name,
Param_kind kind,
Param_kind array_elem_kind,
mi::Size array_size,
mi::Size array_pitch,
char *data_ptr,
const Enum_type_info *enum_info = nullptr)
: m_index(index)
, m_name(name)
, m_display_name(display_name)
, m_group_name(group_name)
, m_kind(kind)
, m_array_elem_kind(array_elem_kind)
, m_array_size(array_size)
, m_array_pitch(array_pitch)
, m_data_ptr(data_ptr)
, m_range_min(-100), m_range_max(100)
, m_enum_info(enum_info)
{
}
// Get data as T&.
template<typename T>
T &data() { return *reinterpret_cast<T *>(m_data_ptr); }
// Get data as const T&.
template<typename T>
const T &data() const { return *reinterpret_cast<const T *>(m_data_ptr); }
const char * &display_name() { return m_display_name; }
const char *display_name() const { return m_display_name; }
const char * &group_name() { return m_group_name; }
const char *group_name() const { return m_group_name; }
Param_kind kind() const { return m_kind; }
Param_kind array_elem_kind() const { return m_array_elem_kind; }
mi::Size array_size() const { return m_array_size; }
mi::Size array_pitch() const { return m_array_pitch; }
float &range_min() { return m_range_min; }
float range_min() const { return m_range_min; }
float &range_max() { return m_range_max; }
float range_max() const { return m_range_max; }
const Enum_type_info *enum_info() const { return m_enum_info; }
private:
mi::Size m_index;
char const *m_name;
char const *m_display_name;
char const *m_group_name;
Param_kind m_kind;
Param_kind m_array_elem_kind;
mi::Size m_array_size;
mi::Size m_array_pitch; // the distance between two array elements
char *m_data_ptr;
float m_range_min, m_range_max;
const Enum_type_info *m_enum_info;
};
// Material information structure.
class Material_info
{
public:
Material_info(char const *name)
: m_name(name)
{}
// Add the parameter information as last entry of the corresponding group, or to the
// end of the list, if no group name is available.
void add_sorted_by_group(const Param_info &info) {
bool group_found = false;
if (info.group_name() != nullptr) {
for (std::list<Param_info>::iterator it = params().begin(); it != params().end(); ++it)
{
const bool same_group =
it->group_name() != nullptr && strcmp(it->group_name(), info.group_name()) == 0;
if (group_found && !same_group) {
m_params.insert(it, info);
return;
}
if (same_group)
group_found = true;
}
}
m_params.push_back(info);
}
// Add a new enum type to the list of used enum types.
void add_enum_type(const std::string name, std::shared_ptr<Enum_type_info> enum_info) {
enum_types[name] = enum_info;
}
// Lookup enum type info for a given enum type absolute MDL name.
const Enum_type_info *get_enum_type(const std::string name) {
Enum_type_map::const_iterator it = enum_types.find(name);
if (it != enum_types.end())
return it->second.get();
return nullptr;
}
// Get the name of the material.
char const *name() const { return m_name; }
// Get the parameters of this material.
std::list<Param_info> &params() { return m_params; }
private:
// name of the material
char const *m_name;
// parameters of the material
std::list<Param_info> m_params;
typedef std::map<std::string, std::shared_ptr<Enum_type_info> > Enum_type_map;
// used enum types of the material
Enum_type_map enum_types;
};
// Helper class to handle Resource tables of the target code.
class Resource_table
{
typedef std::map<std::string, unsigned> Resource_id_map;
public:
enum Kind {
RESOURCE_TEXTURE,
RESOURCE_LIGHT_PROFILE,
RESOURCE_BSDF_MEASUREMENT
};
// Constructor.
Resource_table(
Kind kind)
: m_max_len(0u)
{
read_resources(target_code, transaction, kind);
}
// Get the length of the longest URL in the resource table.
size_t get_max_length() const { return m_max_len; }
// Get all urls.
std::vector<std::string> const &get_urls() const { return m_urls; }
private:
void read_resources(
Kind kind)
{
m_urls.push_back("<unset>");
switch (kind) {
case RESOURCE_TEXTURE:
for (mi::Size i = 1, n = target_code->get_texture_count(); i < n; ++i) {
const char *s = target_code->get_texture(i);
transaction->access<mi::neuraylib::ITexture>(s));
char const *url = nullptr;
if (char const *img = tex->get_image()) {
transaction->access<mi::neuraylib::IImage>(img));
url = image->get_filename();
}
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_LIGHT_PROFILE:
for (mi::Size i = 1, n = target_code->get_light_profile_count(); i < n; ++i) {
const char *s = target_code->get_light_profile(i);
char const *url = lp->get_filename();
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_BSDF_MEASUREMENT:
for (mi::Size i = 1, n = target_code->get_bsdf_measurement_count(); i < n; ++i) {
const char *s = target_code->get_bsdf_measurement(i);
char const *url = bm->get_filename();
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
}
}
private:
Resource_id_map m_resource_map;
std::vector<std::string> m_urls;
size_t m_max_len;
};
// Helper class to handle the string table of a target code.
class String_constant_table
{
typedef std::map<std::string, unsigned> String_map;
public:
// Constructor.
String_constant_table(mi::base::Handle<mi::neuraylib::ITarget_code const> target_code)
{
get_all_strings(target_code);
}
// Get the ID for a given string, return 0 if the string does not exist in the table.
unsigned get_id_for_string(const char *name) {
String_map::const_iterator it(m_string_constants_map.find(name));
if (it != m_string_constants_map.end())
return it->second;
// the user adds a sting that is NOT in the code and we have not seen so far, add it
// and assign a new id
unsigned n_id = unsigned(m_string_constants_map.size() + 1);
m_string_constants_map[name] = n_id;
m_strings.reserve((n_id + 63) & ~63);
m_strings.push_back(name);
size_t l = strlen(name);
if (l > m_max_len)
m_max_len = l;
return n_id;
}
// Get the length of the longest string in the string constant table.
size_t get_max_length() const { return m_max_len; }
// Get the string for a given ID, or nullptr if this ID does not exists.
const char *get_string(unsigned id) {
if (id == 0 || id - 1 >= m_strings.size())
return nullptr;
return m_strings[id - 1].c_str();
}
private:
// Get all string constants used inside a target code and their maximum length.
void get_all_strings(
{
m_max_len = 0;
// ignore the 0, it is the "Not-a-known-string" entry
m_strings.reserve(target_code->get_string_constant_count());
for (mi::Size i = 1, n = target_code->get_string_constant_count(); i < n; ++i) {
const char *s = target_code->get_string_constant(i);
size_t l = strlen(s);
if (l > m_max_len)
m_max_len = l;
m_string_constants_map[s] = (unsigned)i;
m_strings.push_back(s);
}
}
private:
String_map m_string_constants_map;
std::vector<std::string> m_strings;
size_t m_max_len;
};
// Update the camera kernel parameters.
static void update_camera(
Kernel_params &kernel_params,
double phi,
double theta,
float base_dist,
int zoom)
{
kernel_params.cam_dir.x = float(-sin(phi) * sin(theta));
kernel_params.cam_dir.y = float(-cos(theta));
kernel_params.cam_dir.z = float(-cos(phi) * sin(theta));
kernel_params.cam_right.x = float(cos(phi));
kernel_params.cam_right.y = 0.0f;
kernel_params.cam_right.z = float(-sin(phi));
kernel_params.cam_up.x = float(-sin(phi) * cos(theta));
kernel_params.cam_up.y = float(sin(theta));
kernel_params.cam_up.z = float(-cos(phi) * cos(theta));
const float dist = float(base_dist * pow(0.95, double(zoom)));
kernel_params.cam_pos.x = -kernel_params.cam_dir.x * dist;
kernel_params.cam_pos.y = -kernel_params.cam_dir.y * dist;
kernel_params.cam_pos.z = -kernel_params.cam_dir.z * dist;
}
// Add a combobox for the given resource parameter to the GUI
static bool handle_resource(
Param_info &param,
Resource_table const &res_table)
{
bool changed = false;
std::vector<std::string> const &urls = res_table.get_urls();
int id = param.data<int>();
std::string cur_url = urls[id];
if (ImGui::BeginCombo(param.display_name(), cur_url.c_str())) {
for (size_t i = 0, n = urls.size(); i < n; ++i) {
const std::string &name = urls[i];
bool is_selected = (cur_url == name);
if (ImGui::Selectable(name.c_str(), is_selected)) {
param.data<int>() = int(i);
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
return changed;
}
// Progressively render scene
static void render_scene(
const Options &options,
Material_compiler::Material_definition_list const &material_defs,
Material_compiler::Compiled_material_list const &compiled_materials,
std::vector<size_t> const &arg_block_indices,
std::vector<Df_cuda_material> const &material_bundle,
LPE_state_machine &lpe_state_machine)
{
Window_context window_context;
memset(&window_context, 0, sizeof(Window_context));
mi::examples::mdl::GL_display *gl_display = nullptr;
GLFWwindow *window = nullptr;
int width = -1;
int height = -1;
if (options.opengl) {
// Init OpenGL window
std::string version_string;
window = init_opengl(version_string, int(options.res_x), int(options.res_y));
glfwSetWindowUserPointer(window, &window_context);
glfwSetKeyCallback(window, handle_key);
glfwSetScrollCallback(window, handle_scroll);
glfwSetCursorPosCallback(window, handle_mouse_pos);
glfwSetMouseButtonCallback(window, handle_mouse_button);
glfwSetCharCallback(window, ImGui_ImplGlfw_CharCallback);
IMGUI_CHECKVERSION();
ImGui::CreateContext();
ImGui_ImplGlfw_InitForOpenGL(window, false);
ImGui_ImplOpenGL3_Init(version_string.c_str());
ImGui::GetIO().IniFilename = nullptr; // disable creating imgui.ini
ImGui::StyleColorsDark();
ImGui::GetStyle().Alpha = 0.7f;
ImGui::GetStyle().ScaleAllSizes(options.gui_scale);
gl_display = new mi::examples::mdl::GL_display(int(options.res_x), int(options.res_y));
}
// Initialize CUDA
CUcontext cuda_context = init_cuda(options.cuda_device, options.opengl);
CUdeviceptr accum_buffer = 0;
CUdeviceptr aux_albedo_buffer = 0; // buffer for auxiliary output
CUdeviceptr aux_normal_buffer = 0; //
if (!options.opengl) {
width = options.res_x;
height = options.res_y;
check_cuda_success(cuMemAlloc(&accum_buffer, width * height * sizeof(float3)));
check_cuda_success(cuMemAlloc(&aux_albedo_buffer, width * height * sizeof(float3)));
check_cuda_success(cuMemAlloc(&aux_normal_buffer, width * height * sizeof(float3)));
}
// Setup initial CUDA kernel parameters
Kernel_params kernel_params;
memset(&kernel_params, 0, sizeof(Kernel_params));
kernel_params.cam_focal = 1.0f / tanf(options.fov / 2 * float(2 * M_PI / 360));
kernel_params.light_pos = options.light_pos;
kernel_params.light_intensity = fmaxf(
options.light_intensity.x, fmaxf(options.light_intensity.y, options.light_intensity.z));
kernel_params.light_color = kernel_params.light_intensity > 0.0f
? options.light_intensity / kernel_params.light_intensity
: make_float3(1.0f, 0.9f, 0.5f);
kernel_params.env_intensity = 1.0f;
kernel_params.iteration_start = 0;
kernel_params.iteration_num = options.samples_per_iteration;
kernel_params.mdl_test_type = options.mdl_test_type;
kernel_params.max_path_length = options.max_path_length;
kernel_params.exposure_scale = powf(2.0f, options.exposure);
kernel_params.disable_aa = options.no_aa;
kernel_params.use_derivatives = options.enable_derivatives;
kernel_params.enable_auxiliary_output = options.enable_auxiliary_output;
kernel_params.display_buffer_index = 0;
kernel_params.lpe_ouput_expression = 0;
kernel_params.lpe_state_table = nullptr;
kernel_params.lpe_final_mask = nullptr;
kernel_params.current_material = 0;
kernel_params.geometry = material_bundle[kernel_params.current_material].contains_hair_bsdf ?
GT_HAIR : GT_SPHERE;
// Setup camera
float base_dist = length(options.cam_pos);
double theta, phi;
{
const float3 inv_dir = normalize(options.cam_pos);
phi = atan2(inv_dir.x, inv_dir.z);
theta = acos(inv_dir.y);
}
update_camera(kernel_params, phi, theta, base_dist, window_context.zoom);
// Build the full CUDA kernel with all the generated code
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > target_codes;
target_codes.push_back(target_code);
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_df_cuda_derivatives.ptx" : "example_df_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(mi::examples::io::get_executable_folder() + "/" + ptx_name).c_str(),
"render_scene_kernel",
&cuda_function);
// copy materials of the scene to the device
CUdeviceptr material_buffer = 0;
check_cuda_success(cuMemAlloc(&material_buffer,
material_bundle.size() * sizeof(Df_cuda_material)));
check_cuda_success(cuMemcpyHtoD(material_buffer, material_bundle.data(),
material_bundle.size() * sizeof(Df_cuda_material)));
kernel_params.material_buffer = reinterpret_cast<Df_cuda_material*>(material_buffer);
// Setup environment map and acceleration
CUdeviceptr env_accel;
cudaArray_t env_tex_data;
create_environment(
&kernel_params.env_tex, &env_tex_data, &env_accel, &kernel_params.env_size, transaction,
image_api, options.hdrfile.c_str());
kernel_params.env_accel = reinterpret_cast<Env_accel *>(env_accel);
kernel_params.env_rotation = options.hdr_rot / 180.0f * float(M_PI);
// Setup GPU runtime of the LPE state machine
upload_lpe_state_machine(kernel_params, lpe_state_machine);
// Setup file name for nogl mode
std::string next_filename_base;
std::string filename_base, filename_ext;
size_t dot_pos = options.outputfile.rfind('.');
if (dot_pos == std::string::npos) {
filename_base = options.outputfile;
} else {
filename_base = options.outputfile.substr(0, dot_pos);
filename_ext = options.outputfile.substr(dot_pos);
}
if (options.material_names.size() > 1)
next_filename_base = filename_base + "-0";
else
next_filename_base = filename_base;
// Scope for material context resources
{
// Prepare the needed data of all target codes for the GPU
Material_gpu_context material_gpu_context(options.enable_derivatives);
if (!material_gpu_context.prepare_target_code_data(
transaction.get(), image_api.get(), target_code.get(), arg_block_indices))
terminate();
kernel_params.tc_data = reinterpret_cast<Target_code_data *>(
material_gpu_context.get_device_target_code_data_list());
kernel_params.arg_block_list = reinterpret_cast<char const **>(
material_gpu_context.get_device_target_argument_block_list());
String_constant_table constant_table(target_code);
Resource_table texture_table(target_code, transaction, Resource_table::RESOURCE_TEXTURE);
Resource_table lp_table(target_code, transaction, Resource_table::RESOURCE_LIGHT_PROFILE);
Resource_table bm_table(
target_code, transaction, Resource_table::RESOURCE_BSDF_MEASUREMENT);
// Collect information about the arguments of the compiled materials
std::vector<Material_info> mat_infos;
for (size_t i = 0, num_mats = compiled_materials.size(); i < num_mats; ++i) {
// Get the compiled material and the parameter annotations
mi::neuraylib::ICompiled_material const *cur_mat = compiled_materials[i].get();
mi::neuraylib::IMaterial_definition const *cur_def = material_defs[i].get();
// Get the target argument block and its layout
size_t arg_block_index = material_gpu_context.get_bsdf_argument_block_index(i);
material_gpu_context.get_argument_block_layout(arg_block_index));
material_gpu_context.get_argument_block(arg_block_index));
char *arg_block_data = arg_block != nullptr ? arg_block->get_data() : nullptr;
Material_info mat_info(cur_def->get_mdl_name());
for (mi::Size j = 0, num_params = cur_mat->get_parameter_count(); j < num_params; ++j) {
const char *name = cur_mat->get_parameter_name(j);
if (name == nullptr) continue;
// Determine the type of the argument
mi::neuraylib::IValue::Kind kind = arg->get_kind();
Param_info::Param_kind param_kind = Param_info::PK_UNKNOWN;
Param_info::Param_kind param_array_elem_kind = Param_info::PK_UNKNOWN;
mi::Size param_array_size = 0;
mi::Size param_array_pitch = 0;
const Enum_type_info *enum_type = nullptr;
switch (kind) {
param_kind = Param_info::PK_FLOAT;
break;
param_kind = Param_info::PK_COLOR;
break;
param_kind = Param_info::PK_BOOL;
break;
param_kind = Param_info::PK_INT;
break;
{
arg.get_interface<mi::neuraylib::IValue_vector const>());
val->get_type());
val_type->get_element_type());
if (elem_type->get_kind() == mi::neuraylib::IType::TK_FLOAT) {
switch (val_type->get_size()) {
case 2: param_kind = Param_info::PK_FLOAT2; break;
case 3: param_kind = Param_info::PK_FLOAT3; break;
default: assert(false || "Vector Size invalid or unhandled.");
}
}
}
break;
{
arg.get_interface<mi::neuraylib::IValue_array const>());
val->get_type());
val_type->get_element_type());
// we currently only support arrays of some values
switch (elem_type->get_kind()) {
param_array_elem_kind = Param_info::PK_FLOAT;
break;
param_array_elem_kind = Param_info::PK_COLOR;
break;
param_array_elem_kind = Param_info::PK_BOOL;
break;
param_array_elem_kind = Param_info::PK_INT;
break;
{
elem_type.get_interface<
val_type->get_element_type());
if (velem_type->get_kind() == mi::neuraylib::IType::TK_FLOAT) {
switch (val_type->get_size()) {
case 2:
param_array_elem_kind = Param_info::PK_FLOAT2;
break;
case 3:
param_array_elem_kind = Param_info::PK_FLOAT3;
break;
default:
assert(false || "Vector Size invalid or unhandled.");
}
}
}
break;
default:
assert(false || "Array element type invalid or unhandled.");
}
if (param_array_elem_kind != Param_info::PK_UNKNOWN) {
param_kind = Param_info::PK_ARRAY;
param_array_size = val_type->get_size();
// determine pitch of array if there are at least two elements
if (param_array_size > 1) {
layout->get_nested_state(j));
layout->get_nested_state(1, array_state));
mi::Size param_size;
mi::Size start_offset = layout->get_layout(
kind, param_size, array_state);
mi::Size next_offset = layout->get_layout(
kind, param_size, next_elem_state);
param_array_pitch = next_offset - start_offset;
}
}
}
break;
{
arg.get_interface<mi::neuraylib::IValue_enum const>());
val->get_type());
// prepare info for this enum type if not seen so far
const Enum_type_info *info = mat_info.get_enum_type(val_type->get_symbol());
if (info == nullptr) {
std::shared_ptr<Enum_type_info> p(new Enum_type_info());
for (mi::Size i = 0, n = val_type->get_size(); i < n; ++i) {
p->add(val_type->get_value_name(i), val_type->get_value_code(i));
}
mat_info.add_enum_type(val_type->get_symbol(), p);
info = p.get();
}
enum_type = info;
param_kind = Param_info::PK_ENUM;
}
break;
param_kind = Param_info::PK_STRING;
break;
param_kind = Param_info::PK_TEXTURE;
break;
param_kind = Param_info::PK_LIGHT_PROFILE;
break;
param_kind = Param_info::PK_BSDF_MEASUREMENT;
break;
default:
// Unsupported? -> skip
continue;
}
// Get the offset of the argument within the target argument block
mi::neuraylib::Target_value_layout_state state(layout->get_nested_state(j));
mi::Size param_size;
mi::Size offset = layout->get_layout(kind2, param_size, state);
check_success(kind == kind2);
Param_info param_info(
j,
name,
name,
/*group_name=*/ nullptr,
param_kind,
param_array_elem_kind,
param_array_size,
param_array_pitch,
arg_block_data + offset,
enum_type);
// Check for annotation info
anno_list->get_annotation_block(name));
if (anno_block) {
mi::neuraylib::Annotation_wrapper annos(anno_block.get());
mi::Size anno_index =
annos.get_annotation_index("::anno::soft_range(float,float)");
if (anno_index == mi::Size(-1)) {
anno_index = annos.get_annotation_index("::anno::hard_range(float,float)");
}
if (anno_index != mi::Size(-1)) {
annos.get_annotation_param_value(anno_index, 0, param_info.range_min());
annos.get_annotation_param_value(anno_index, 1, param_info.range_max());
}
anno_index = annos.get_annotation_index("::anno::display_name(string)");
if (anno_index != mi::Size(-1)) {
annos.get_annotation_param_value(anno_index, 0, param_info.display_name());
}
anno_index = annos.get_annotation_index("::anno::in_group(string)");
if (anno_index != mi::Size(-1)) {
annos.get_annotation_param_value(anno_index, 0, param_info.group_name());
}
}
mat_info.add_sorted_by_group(param_info);
}
mat_infos.push_back(mat_info);
}
std::chrono::duration<double> state_update_time( 0.0 );
std::chrono::duration<double> render_time( 0.0 );
std::chrono::duration<double> display_time( 0.0 );
char stats_text[128];
int last_update_frames = -1;
auto last_update_time = std::chrono::steady_clock::now();
const std::chrono::duration<double> update_min_interval( 0.5 );
// Main render loop
while (true)
{
std::chrono::time_point<std::chrono::steady_clock> t0;
if (!options.opengl)
{
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
kernel_params.albedo_buffer = reinterpret_cast<float3 *>(aux_albedo_buffer);
kernel_params.normal_buffer = reinterpret_cast<float3 *>(aux_normal_buffer);
// Check if desired number of samples is reached
if (kernel_params.iteration_start >= options.iterations) {
std::cout << "rendering done" << std::endl;
save_result(
accum_buffer, width, height,
next_filename_base + filename_ext,
image_api, mdl_impexp_api);
save_result(
aux_albedo_buffer, width, height,
next_filename_base + "_albedo" + filename_ext,
image_api, mdl_impexp_api);
save_result(
aux_normal_buffer, width, height,
next_filename_base + "_normal" + filename_ext,
image_api, mdl_impexp_api);
std::cout << std::endl;
// All materials have been rendered? -> done
if (kernel_params.current_material + 1 >= material_bundle.size())
break;
if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
kernel_params.geometry = GT_SPHERE;
else
kernel_params.geometry = GT_HAIR;
// Start new image with next material
kernel_params.iteration_start = 0;
++kernel_params.current_material;
next_filename_base =
filename_base + "-" + to_string(kernel_params.current_material);
}
std::cout
<< "rendering iterations " << kernel_params.iteration_start << " to "
<< kernel_params.iteration_start + kernel_params.iteration_num << std::endl;
}
else
{
t0 = std::chrono::steady_clock::now();
// Check for termination
if (glfwWindowShouldClose(window))
break;
// Poll for events and process them
glfwPollEvents();
// Check if buffers need to be resized
int nwidth, nheight;
glfwGetFramebufferSize(window, &nwidth, &nheight);
if (nwidth != width || nheight != height)
{
width = nwidth;
height = nheight;
gl_display->resize(width, height);
resize_buffers(
&accum_buffer, width, height);
kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
resize_buffers(&aux_albedo_buffer, width, height);
kernel_params.albedo_buffer = reinterpret_cast<float3 *>(aux_albedo_buffer);
resize_buffers(&aux_normal_buffer, width, height);
kernel_params.normal_buffer = reinterpret_cast<float3 *>(aux_normal_buffer);
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.iteration_start = 0;
}
// Don't render anything, if minimized
if (width == 0 || height == 0) {
// Wait until something happens
glfwWaitEvents();
continue;
}
ImGui_ImplOpenGL3_NewFrame();
ImGui_ImplGlfw_NewFrame();
ImGui::NewFrame();
// Create material parameter editor window
ImGui::SetNextWindowPos(ImVec2(10, 100), ImGuiCond_FirstUseEver);
ImGui::SetNextWindowSize(
ImVec2(360 * options.gui_scale, 600 * options.gui_scale),
ImGuiCond_FirstUseEver);
ImGui::Begin("Settings");
ImGui::SetWindowFontScale(options.gui_scale);
ImGui::PushItemWidth(-200 * options.gui_scale);
if (options.use_class_compilation)
ImGui::Text("CTRL + Click to manually enter numbers");
else
ImGui::Text("Parameter editing requires class compilation.");
if (kernel_params.enable_auxiliary_output)
{
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Display options");
ImGui::Separator();
std::string current_lpe_name = lpe_state_machine.get_expression_name(
kernel_params.lpe_ouput_expression);
if (ImGui::BeginCombo("LPE", current_lpe_name.c_str()))
{
for (uint32_t i = 0; i < lpe_state_machine.get_expression_count(); ++i)
{
const char* name = lpe_state_machine.get_expression_name(i);
bool is_selected = (i == kernel_params.lpe_ouput_expression);
if (ImGui::Selectable(name, is_selected))
{
kernel_params.lpe_ouput_expression = i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
std::string current_display_buffer =
to_string((Display_buffer_options) kernel_params.display_buffer_index);
if (ImGui::BeginCombo("buffer", current_display_buffer.c_str()))
{
for (unsigned i = 0; i < (unsigned) DISPLAY_BUFFER_COUNT; ++i)
{
const std::string &name = to_string((Display_buffer_options) i);
bool is_selected = (current_display_buffer == name);
if (ImGui::Selectable(name.c_str(), is_selected))
{
kernel_params.display_buffer_index = i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Light parameters");
ImGui::Separator();
if (ImGui::ColorEdit3("Point Light Color", &kernel_params.light_color.x))
kernel_params.iteration_start = 0;
if (ImGui::SliderFloat("Point Light Intensity",
&kernel_params.light_intensity, 0.0f, 50000.0f))
kernel_params.iteration_start = 0;
if (ImGui::SliderFloat("Environment Intensity Scale",
&kernel_params.env_intensity, 0.0f, 10.0f))
kernel_params.iteration_start = 0;
float env_rot_degree = kernel_params.env_rotation / float(M_PI) * 180.0f;
if (ImGui::SliderFloat("Environment Rotation",
&env_rot_degree, 0.0f, 360.0f))
{
// wrap in case of negative input
// we don't want fmodf behavior for negative values
env_rot_degree -= floorf(env_rot_degree / 360.0f) * 360.f;
kernel_params.env_rotation = fmodf(env_rot_degree, 360.0f) / 180.0f * float(M_PI);
kernel_params.iteration_start = 0;
}
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Material parameters");
ImGui::Separator();
Material_info &mat_info = mat_infos[
material_bundle[kernel_params.current_material].compiled_material_index];
// Print material name
ImGui::Text("%s", mat_info.name());
bool changed = false;
const char *group_name = nullptr;
int id = 0;
for (std::list<Param_info>::iterator it = mat_info.params().begin(),
end = mat_info.params().end(); it != end; ++it, ++id)
{
Param_info &param = *it;
// Ensure unique ID even for parameters with same display names
ImGui::PushID(id);
// Group name changed? -> Start new group with new header
if ((!param.group_name() != !group_name) ||
(param.group_name() &&
(!group_name || strcmp(group_name, param.group_name()) != 0)))
{
ImGui::Separator();
if (param.group_name() != nullptr)
ImGui::Text("%s", param.group_name());
group_name = param.group_name();
}
// Choose proper edit control depending on the parameter kind
switch (param.kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
param.display_name(),
&param.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
param.display_name(),
&param.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
param.display_name(),
&param.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
param.display_name(),
&param.data<float>());
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
&param.data<bool>());
break;
case Param_info::PK_INT:
changed |= ImGui::SliderInt(
param.display_name(),
&param.data<int>(),
int(param.range_min()),
int(param.range_max()));
break;
case Param_info::PK_ARRAY:
{
ImGui::Text("%s", param.display_name());
ImGui::Indent(16.0f * options.gui_scale);
char *ptr = &param.data<char>();
for (mi::Size i = 0, n = param.array_size(); i < n; ++i) {
std::string idx_str = to_string(i);
switch (param.array_elem_kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
idx_str.c_str(),
reinterpret_cast<float *>(ptr));
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
reinterpret_cast<bool *>(ptr));
break;
case Param_info::PK_INT:
changed |= ImGui::SliderInt(
param.display_name(),
reinterpret_cast<int *>(ptr),
int(param.range_min()),
int(param.range_max()));
break;
default:
assert(false || "Array element type invalid or unhandled.");
}
ptr += param.array_pitch();
}
ImGui::Unindent(16.0f * options.gui_scale);
}
break;
case Param_info::PK_ENUM:
{
int value = param.data<int>();
std::string curr_value;
const Enum_type_info *info = param.enum_info();
for (size_t i = 0, n = info->values.size(); i < n; ++i) {
if (info->values[i].value == value) {
curr_value = info->values[i].name;
break;
}
}
if (ImGui::BeginCombo(param.display_name(), curr_value.c_str())) {
for (size_t i = 0, n = info->values.size(); i < n; ++i) {
const std::string &name = info->values[i].name;
bool is_selected = (curr_value == name);
if (ImGui::Selectable(
info->values[i].name.c_str(), is_selected)) {
param.data<int>() = info->values[i].value;
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
break;
case Param_info::PK_STRING:
{
std::vector<char> buf;
size_t max_len = constant_table.get_max_length();
max_len = max_len > 63 ? max_len + 1 : 64;
buf.resize(max_len);
// fill the current value
unsigned curr_index = param.data<unsigned>();
const char *opt = constant_table.get_string(curr_index);
strcpy(buf.data(), opt != nullptr ? opt : "");
if (ImGui::InputText(
param.display_name(),
buf.data(), buf.size(),
ImGuiInputTextFlags_EnterReturnsTrue))
{
unsigned id = constant_table.get_id_for_string(buf.data());
param.data<unsigned>() = id;
changed = true;
}
}
break;
case Param_info::PK_TEXTURE:
changed |= handle_resource(param, texture_table);
break;
case Param_info::PK_LIGHT_PROFILE:
changed |= handle_resource(param, lp_table);
break;
case Param_info::PK_BSDF_MEASUREMENT:
changed |= handle_resource(param, bm_table);
break;
case Param_info::PK_UNKNOWN:
break;
}
ImGui::PopID();
}
if (options.enable_derivatives) {
ImGui::Separator();
bool b = kernel_params.use_derivatives != 0;
if (ImGui::Checkbox("Use derivatives", &b)) {
kernel_params.iteration_start = 0;
kernel_params.use_derivatives = b;
}
}
ImGui::PopItemWidth();
ImGui::End();
// If any material argument changed, update the target argument block on the device
if (changed) {
material_gpu_context.update_device_argument_block(
material_bundle[kernel_params.current_material].argument_block_index);
kernel_params.iteration_start = 0;
}
// Handle events
Window_context *ctx =
static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->save_result && !ImGui::GetIO().WantCaptureKeyboard) {
save_result(
accum_buffer,
width, height,
options.outputfile,
image_api, mdl_impexp_api);
save_result(
aux_albedo_buffer,
width, height,
filename_base + "_albedo" + filename_ext,
image_api, mdl_impexp_api);
save_result(
aux_normal_buffer,
width, height,
filename_base + "_normal" + filename_ext,
image_api, mdl_impexp_api);
}
if (ctx->exposure_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.exposure_scale = powf(2.0f, ctx->exposure);
}
if (ctx->key_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.iteration_start = 0;
// Update change material
const unsigned num_materials = unsigned(material_bundle.size());
kernel_params.current_material = (kernel_params.current_material +
ctx->material_index_delta + num_materials) % num_materials;
ctx->material_index_delta = 0;
if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
kernel_params.geometry = GT_SPHERE;
else
kernel_params.geometry = GT_HAIR;
}
if (ctx->mouse_button - 1 == GLFW_MOUSE_BUTTON_LEFT) {
// Only accept button press when not hovering GUI window
if (ctx->mouse_button_action == GLFW_PRESS &&
!ImGui::GetIO().WantCaptureMouse) {
ctx->moving = true;
glfwGetCursorPos(window, &ctx->move_start_x, &ctx->move_start_y);
}
else
ctx->moving = false;
}
if (ctx->mouse_wheel_delta && !ImGui::GetIO().WantCaptureMouse) {
ctx->zoom += ctx->mouse_wheel_delta;
}
if (ctx->mouse_event && !ImGui::GetIO().WantCaptureMouse) {
kernel_params.iteration_start = 0;
// Update camera
phi -= ctx->move_dx * 0.001 * M_PI;
theta -= ctx->move_dy * 0.001 * M_PI;
theta = std::max(theta, 0.00 * M_PI);
theta = std::min(theta, 1.00 * M_PI);
ctx->move_dx = ctx->move_dy = 0.0;
update_camera(kernel_params, phi, theta, base_dist, ctx->zoom);
}
// Clear all events
ctx->save_result = false;
ctx->key_event = false;
ctx->mouse_event = false;
ctx->exposure_event = false;
ctx->mouse_wheel_delta = 0;
ctx->mouse_button = 0;
auto t1 = std::chrono::steady_clock::now();
state_update_time += t1 - t0;
t0 = t1;
// Map GL buffer for access with CUDA
kernel_params.display_buffer =
reinterpret_cast<unsigned int *>(gl_display->map(0));
}
// Launch kernel
dim3 threads_per_block(16, 16);
dim3 num_blocks((width + 15) / 16, (height + 15) / 16);
void *params[] = { &kernel_params };
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, params, nullptr));
kernel_params.iteration_start += kernel_params.iteration_num;
// Make sure, any debug::print()s are written to the console
check_cuda_success(cuStreamSynchronize(0));
if (options.opengl)
{
// Unmap GL buffer
gl_display->unmap(0);
auto t1 = std::chrono::steady_clock::now();
render_time += t1 - t0;
t0 = t1;
// Render GL buffer
gl_display->update_display();
t1 = std::chrono::steady_clock::now();
display_time += t1 - t0;
// Render stats window
ImGui::SetNextWindowPos(ImVec2(10, 10));
ImGui::Begin("##notitle", nullptr,
ImGuiWindowFlags_NoDecoration |
ImGuiWindowFlags_AlwaysAutoResize |
ImGuiWindowFlags_NoSavedSettings |
ImGuiWindowFlags_NoFocusOnAppearing |
ImGuiWindowFlags_NoNav);
// Update stats only every 0.5s
++last_update_frames;
if (t1 - last_update_time > update_min_interval || last_update_frames == 0) {
typedef std::chrono::duration<double, std::milli> durationMs;
snprintf(stats_text, sizeof(stats_text),
"%5.1f fps\n\n"
"state update: %8.1f ms\n"
"render: %8.1f ms\n"
"display: %8.1f ms\n",
last_update_frames / std::chrono::duration<double>(
t1 - last_update_time).count(),
(durationMs(state_update_time) / last_update_frames).count(),
(durationMs(render_time) / last_update_frames).count(),
(durationMs(display_time) / last_update_frames).count());
last_update_time = t1;
last_update_frames = 0;
state_update_time = render_time = display_time =
std::chrono::duration<double>::zero();
}
ImGui::TextUnformatted(stats_text);
ImGui::End();
// Show the GUI
ImGui::Render();
ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
// Swap front and back buffers
glfwSwapBuffers(window);
}
}
}
// Cleanup CUDA
check_cuda_success(cudaDestroyTextureObject(kernel_params.env_tex));
check_cuda_success(cudaFreeArray(env_tex_data));
check_cuda_success(cuMemFree(env_accel));
check_cuda_success(cuMemFree(accum_buffer));
check_cuda_success(cuMemFree(aux_albedo_buffer));
check_cuda_success(cuMemFree(aux_normal_buffer));
check_cuda_success(cuMemFree(material_buffer));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
check_cuda_success(cuModuleUnload(cuda_module));
uninit_cuda(cuda_context);
// Cleanup OpenGL
if (options.opengl) {
delete gl_display;
gl_display = nullptr;
ImGui_ImplOpenGL3_Shutdown();
ImGui_ImplGlfw_Shutdown();
ImGui::DestroyContext();
glfwDestroyWindow(window);
glfwTerminate();
}
}
// Returns true, if the string str starts with the given prefix, false otherwise.
bool starts_with(std::string const &str, std::string const &prefix)
{
return str.size() >= prefix.size() && str.compare(0, prefix.size(), prefix) == 0;
}
// Create application material representation for use in our CUDA kernel
Df_cuda_material create_cuda_material(
size_t target_code_index,
size_t compiled_material_index,
std::vector<mi::neuraylib::Target_function_description> const& descs,
bool use_hair_bsdf)
{
Df_cuda_material mat;
// shared by all generated functions of the same material
// used here to alter the materials parameter set
mat.compiled_material_index = static_cast<unsigned int>(compiled_material_index);
// Note: the same argument_block_index is filled into all function descriptions of a
// material, if any function uses it
mat.argument_block_index = static_cast<unsigned int>(descs[0].argument_block_index);
if (!use_hair_bsdf)
{
// identify the BSDF function by target_code_index (i'th link unit)
// and the function_index inside this target_code.
// same for the EDF and the intensity expression.
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[0].function_index);
mat.edf.x = static_cast<unsigned int>(target_code_index);
mat.edf.y = static_cast<unsigned int>(descs[1].function_index);
mat.emission_intensity.x = static_cast<unsigned int>(target_code_index);
mat.emission_intensity.y = static_cast<unsigned int>(descs[2].function_index);
mat.volume_absorption.x = static_cast<unsigned int>(target_code_index);
mat.volume_absorption.y = static_cast<unsigned int>(descs[3].function_index);
mat.thin_walled.x = static_cast<unsigned int>(target_code_index);
mat.thin_walled.y = static_cast<unsigned int>(descs[4].function_index);
}
else
{
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[5].function_index);
mat.contains_hair_bsdf = 1;
}
// init tag maps with zeros (optional)
memset(mat.bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
return mat;
}
void create_cuda_material_handles(
Df_cuda_material& mat,
const mi::neuraylib::ITarget_code* target_code,
LPE_state_machine& lpe_state_machine)
{
// fill tag ID list.
// allows to map from local per material Tag IDs to global per scene Tag IDs
// Note, calling 'LPE_state_machine::handle_to_global_tag(...)' registers the string handles
// present in the MDL in our 'scene'
mat.bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
target_code->get_callable_function_df_handle_count(mat.bsdf.y));
for (mi::Size i = 0; i < mat.bsdf_mtag_to_gtag_map_size; ++i)
mat.bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
target_code->get_callable_function_df_handle(mat.bsdf.y, i));
// same for all other distribution functions
mat.edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
target_code->get_callable_function_df_handle_count(mat.edf.y));
for (mi::Size i = 0; i < mat.edf_mtag_to_gtag_map_size; ++i)
mat.edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
target_code->get_callable_function_df_handle(mat.edf.y, i));
}
// checks if a compiled material contains none-invalid hair BSDF
bool contains_hair_bsdf(const mi::neuraylib::ICompiled_material* compiled_material)
{
compiled_material->get_body());
for (mi::Size i = 0, n = body_args->get_size(); i < n; ++i)
{
const char* name = body_args->get_name(i);
if (strcmp(name, "hair") == 0)
{
body_args->get_expression(i));
if (hair_exp->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
return true;
hair_exp->get_interface<const mi::neuraylib::IExpression_constant>());
hair_exp_const->get_value());
return hair_exp_const_value->get_kind() != mi::neuraylib::IValue::VK_INVALID_DF;
}
}
return true;
}
static void usage(const char *name)
{
std::cout
<< "usage: " << name << " [options] [<material_name1|full_mdle_path1> ...]\n"
<< "-h|--help print this text and exit\n"
<< "-v|--version print the MDL SDK version string and exit\n"
<< "--device <id> run on CUDA device <id> (default: 0)\n"
<< "--nogl don't open interactive display\n"
<< "--nocc don't use class-compilation\n"
<< "--noaux don't generate code for albedo and normal buffers\n"
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
<< "--interpreter use the DF interpreter (if possible)\n"
#endif
<< "--gui_scale <factor> GUI scaling factor (default: 1.0)\n"
<< "--res <res_x> <res_y> resolution (default: 1024x1024)\n"
<< "--hdr <filename> HDR environment map "
"(default: nvidia/sdk_examples/resources/environment.hdr)\n"
<< "--hdr_rot <degrees> rotation of the environment in degrees (default: 0.0)\n"
<< "-o <outputfile> image file to write result to (default: output.exr).\n"
<< " With multiple materials \"-<material index>\" will be\n"
<< " added in front of the extension\n"
<< "--spp <num> samples per pixel, only active for --nogl (default: 4096)\n"
<< "--spi <num> samples per render call (default: 8)\n"
<< "-t <type> 0: eval, 1: sample, 2: mis, 3: mis + pdf, 4: no env\n"
<< " (default: 2)\n"
<< "-e <exposure> exposure for interactive display (default: 0.0)\n"
<< "-f <fov> the camera field of view in degree (default: 96.0)\n"
<< "-p <x> <y> <z> set the camera position (default 0 0 3).\n"
<< " The camera will always look towards (0, 0, 0).\n"
<< "-l <x> <y> <z> <r> <g> <b> add an isotropic point light with given coordinates and "
"intensity (flux)\n"
<< "--mdl_path <path> MDL search path, can occur multiple times.\n"
<< "--max_path_length <num> maximum path length, default 4 (up to one total internal\n"
<< " reflection), clamped to 2..100\n"
<< "--noaa disable pixel oversampling\n"
<< "-d enable use of derivatives\n"
<< "--fold_ternary_on_df fold all ternary operators on *df types (default: false)\n"
<< "\n"
<< "Note: material names can end with an '*' as a wildcard\n"
<< " and alternatively, full MDLE file paths can be passed as material name\n";
exit(EXIT_FAILURE);
}
int MAIN_UTF8(int argc, char* argv[])
{
// Parse commandline options
Options options;
mi::examples::mdl::Configure_options configure_options;
bool print_version_and_exit = false;
for (int i = 1; i < argc; ++i) {
const char *opt = argv[i];
if (opt[0] == '-') {
if (strcmp(opt, "--device") == 0 && i < argc - 1) {
options.cuda_device = atoi(argv[++i]);
} else if (strcmp(opt, "--nogl") == 0) {
options.opengl = false;
} else if (strcmp(opt, "--nocc") == 0) {
options.use_class_compilation = false;
} else if (strcmp(opt, "--noaux") == 0) {
options.enable_auxiliary_output = false;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
} else if (strcmp(opt, "--interpreter") == 0) {
options.use_df_interpreter = true;
#endif
} else if (strcmp(opt, "--gui_scale") == 0 && i < argc - 1) {
options.gui_scale = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "--res") == 0 && i < argc - 2) {
options.res_x = std::max(atoi(argv[++i]), 1);
options.res_y = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--hdr") == 0 && i < argc - 1) {
options.hdrfile = argv[++i];
} else if (strcmp(opt, "--hdr_rot") == 0 && i < argc - 1) {
options.hdr_rot = static_cast<float>(atof(argv[++i]));
// wrap in case of negative input
// we don't want fmodf behavior for negative values
options.hdr_rot -= floorf(options.hdr_rot / 360.0f) * 360.f;
options.hdr_rot = fmodf(options.hdr_rot, 360.0f);
} else if (strcmp(opt, "-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
} else if (strcmp(opt, "--spp") == 0 && i < argc - 1) {
options.iterations = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--spi") == 0 && i < argc - 1) {
options.samples_per_iteration = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "-t") == 0 && i < argc - 1) {
const int type = atoi(argv[++i]);
if (type < 0 || type >= MDL_TEST_COUNT) {
std::cout << "Invalid type for \"-t\" option!" << std::endl;
usage(argv[0]);
}
options.mdl_test_type = type;
} else if (strcmp(opt, "-e") == 0 && i < argc - 1) {
options.exposure = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-f") == 0 && i < argc - 1) {
options.fov = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-p") == 0 && i < argc - 3) {
options.cam_pos.x = static_cast<float>(atof(argv[++i]));
options.cam_pos.y = static_cast<float>(atof(argv[++i]));
options.cam_pos.z = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-l") == 0 && i < argc - 6) {
options.light_pos.x = static_cast<float>(atof(argv[++i]));
options.light_pos.y = static_cast<float>(atof(argv[++i]));
options.light_pos.z = static_cast<float>(atof(argv[++i]));
options.light_intensity.x = static_cast<float>(atof(argv[++i]));
options.light_intensity.y = static_cast<float>(atof(argv[++i]));
options.light_intensity.z = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "--mdl_path") == 0 && i < argc - 1) {
configure_options.additional_mdl_paths.push_back(argv[++i]);
} else if (strcmp(opt, "--max_path_length") == 0 && i < argc - 1) {
options.max_path_length = std::min(std::max(atoi(argv[++i]), 2), 100);
} else if (strcmp(opt, "--noaa") == 0) {
options.no_aa = true;
} else if (strcmp(opt, "-d") == 0) {
options.enable_derivatives = true;
} else if (strcmp(opt, "--fold_ternary_on_df") == 0) {
options.fold_ternary_on_df = true;
} else if (strcmp(opt, "-v") == 0 || strcmp(opt, "--version") == 0) {
print_version_and_exit = true;
} else {
if (strcmp(opt, "-h") != 0 && strcmp(opt, "--help") != 0)
std::cout << "Unknown option: \"" << opt << "\"" << std::endl;
usage(argv[0]);
}
}
else
options.material_names.push_back(std::string(opt));
}
// Use default material, if none was provided via command line
if (options.material_names.empty())
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_df");
// Access the MDL SDK
mi::base::Handle<mi::neuraylib::INeuray> neuray(mi::examples::mdl::load_and_get_ineuray());
if (!neuray.is_valid_interface())
exit_failure("Failed to load the SDK.");
// Handle the --version flag
if (print_version_and_exit) {
// print library version information.
fprintf(stdout, "%s\n", version->get_string());
// free the handles and unload the MDL SDK
version = nullptr;
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
// Configure the MDL SDK
if (!mi::examples::mdl::configure(neuray.get(), configure_options))
exit_failure("Failed to initialize the SDK.");
// Start the MDL SDK
mi::Sint32 ret = neuray->start();
if (ret != 0)
exit_failure("Failed to initialize the SDK. Result code: %d", ret);
// LPE state machine for rendering into multiple buffers
LPE_state_machine lpe_state_machine;
lpe_state_machine.handle_to_global_tag("point_light"); // register handles before building
lpe_state_machine.handle_to_global_tag("env"); // the state machine
// register other handles in the scene, e.g.: for object instances
lpe_state_machine.handle_to_global_tag("sphere"); // for illustration, not used currently
// Add some common and custom LPEs
lpe_state_machine.add_expression("Beauty", LPE::create_common(LPE::Common::Beauty));
lpe_state_machine.add_expression("Diffuse", LPE::create_common(LPE::Common::Diffuse));
lpe_state_machine.add_expression("Glossy", LPE::create_common(LPE::Common::Glossy));
lpe_state_machine.add_expression("Specular", LPE::create_common(LPE::Common::Specular));
lpe_state_machine.add_expression("SSS", LPE::create_common(LPE::Common::SSS));
lpe_state_machine.add_expression("Transmission", LPE::create_common(LPE::Common::Transmission));
lpe_state_machine.add_expression("Beauty-Env", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::light("env") })); // only light with the name 'env'
lpe_state_machine.add_expression("Beauty-PointLight", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::light("point_light") })); // only light with the name 'point_light'
lpe_state_machine.add_expression("Beauty-Emission", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::emission() })); // only emission
lpe_state_machine.add_expression("Beauty-Base", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("base")),
LPE::light()})); // no emission
lpe_state_machine.add_expression("Beauty-Coat", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("coat")),
LPE::light()})); // no emission
lpe_state_machine.add_expression("Beauty-^Coat", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("coat", false)),
LPE::any_light()})); // emission or light source
{
// Create a transaction
mi::base::Handle<mi::neuraylib::IScope> scope(database->get_global_scope());
mi::base::Handle<mi::neuraylib::ITransaction> transaction(scope->create_transaction());
// Access needed API components
{
// Initialize the material compiler with 16 result buffer slots ("texture results")
Material_compiler mc(
mdl_impexp_api.get(),
mdl_backend_api.get(),
mdl_factory.get(),
transaction.get(),
16,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
options.use_df_interpreter,
#endif
options.enable_derivatives,
options.fold_ternary_on_df,
options.enable_auxiliary_output,
/*df_handle_mode=*/ "pointer");
// List of materials in the scene
std::vector<Df_cuda_material> material_bundle;
// Select the functions to translate
std::vector<mi::neuraylib::Target_function_description> descs;
descs.push_back(
mi::neuraylib::Target_function_description("surface.scattering"));
descs.push_back(
mi::neuraylib::Target_function_description("surface.emission.emission"));
descs.push_back(
mi::neuraylib::Target_function_description("surface.emission.intensity"));
descs.push_back(
mi::neuraylib::Target_function_description("volume.absorption_coefficient"));
descs.push_back(
mi::neuraylib::Target_function_description("thin_walled"));
descs.push_back(
mi::neuraylib::Target_function_description("hair"));
// Generate code for all materials
std::vector<std::string> used_material_names;
for (size_t i = 0; i < options.material_names.size(); ++i) {
std::string& opt_material_name = options.material_names[i];
// split module and material name
std::string module_qualified_name, material_simple_name;
if (!mi::examples::mdl::parse_cmd_argument_material_name(
opt_material_name, module_qualified_name, material_simple_name, true))
exit_failure("Provided material name '%s' is invalid.",
opt_material_name.c_str());
// Is this a material name pattern? (not applicable to mdle)
if (!mi::examples::strings::ends_with(module_qualified_name, ".mdle") &&
opt_material_name.size() > 1 &&
opt_material_name.back() == '*') {
// prepare the pattern for matching
std::string pattern = opt_material_name.substr(0, opt_material_name.size() - 1);
if (!starts_with(pattern, "::"))
pattern = "::" + pattern;
// load the module
std::string module_db_name = mc.load_module(module_qualified_name);
// iterate over all materials in that module
transaction->access<const mi::neuraylib::IModule>(module_db_name.c_str()));
for (mi::Size j = 0, n = loaded_module->get_material_count(); j < n; ++j) {
// get the j`th material
const char* material_db_name = loaded_module->get_material(j);
material_db_name));
// make sure the material name starts with the pattern
std::string material_qualified_name = mat_def->get_mdl_name();
if (!mi::examples::strings::starts_with(material_qualified_name, pattern))
continue;
std::cout << "Adding material \"" << material_qualified_name << std::endl;
// Add functions of the material to the link unit
check_success(mc.add_material(
module_qualified_name, mat_def->get_mdl_simple_name(),
descs.data(), descs.size(),
options.use_class_compilation));
mc.get_compiled_materials().back());
// Create application material representation
material_bundle.push_back(create_cuda_material(
0, material_bundle.size(), descs,
contains_hair_bsdf(compiled_material.get())));
used_material_names.push_back(material_qualified_name);
}
} else {
std::string material_qualified_name = module_qualified_name + "::" + material_simple_name;
std::cout << "Adding material \"" << material_qualified_name << std::endl;
// Add functions of the material to the link unit
check_success(mc.add_material(
module_qualified_name, material_simple_name,
descs.data(), descs.size(),
options.use_class_compilation));
mc.get_compiled_materials().back());
// Create application material representation
material_bundle.push_back(create_cuda_material(
0, material_bundle.size(), descs,
contains_hair_bsdf(compiled_material.get())));
used_material_names.push_back(material_qualified_name);
}
}
// Update the material names with the actually used names
options.material_names = used_material_names;
// Generate the CUDA PTX code for the link unit
mc.generate_cuda_ptx());
// convert handles to tag IDs
for (auto& mat : material_bundle)
create_cuda_material_handles(mat, target_code.get(), lpe_state_machine);
// Acquire image API needed to prepare the textures
// when all scene elements that have handles are loaded and all handles as well as
// light path expressions are registered, the state machine can be constructed.
lpe_state_machine.build();
// Render
render_scene(
options,
transaction,
image_api,
mdl_impexp_api,
target_code,
mc.get_material_defs(),
mc.get_compiled_materials(),
mc.get_argument_block_indices(),
material_bundle,
lpe_state_machine);
}
transaction->commit();
}
// Shut down the MDL SDK
if (neuray->shutdown() != 0)
exit_failure("Failed to shutdown the SDK.");
// Unload the MDL SDK
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
// Convert command line arguments to UTF8 on Windows
COMMANDLINE_TO_UTF8

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.h

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
#ifndef EXAMPLE_DF_CUDA_H
#define EXAMPLE_DF_CUDA_H
#include <cstdint>
#include <vector_types.h>
#include <texture_types.h>
struct Target_code_data;
enum Mdl_test_type {
MDL_TEST_EVAL = 0, // only use BSDF evaluation
MDL_TEST_SAMPLE = 1, // only use BSDF sampling
MDL_TEST_MIS = 2, // multiple importance sampling
MDL_TEST_MIS_PDF = 3, // multiple importance sampling, but use BSDF explicit pdf computation
MDL_TEST_NO_ENV = 4, // no environment sampling
MDL_TEST_COUNT
};
const unsigned MAX_DF_HANDLES = 8;
struct Env_accel {
unsigned int alias;
float q;
float pdf;
};
namespace
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
inline uint2 make_invalid()
{
uint2 index_pair;
index_pair.x = ~0;
index_pair.y = ~0;
return index_pair;
}
}
struct Df_cuda_material
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
Df_cuda_material()
: compiled_material_index(0)
, argument_block_index(~0)
, bsdf(make_invalid())
, edf(make_invalid())
, emission_intensity(make_invalid())
, volume_absorption(make_invalid())
, thin_walled(make_invalid())
, contains_hair_bsdf(0)
{
}
// used on host side only
unsigned int compiled_material_index;
// the argument block index of this material (~0 if not used)
unsigned int argument_block_index;
// pair of target_code_index and function_index to identify the bsdf
uint2 bsdf;
// pair of target_code_index and function_index to identify the edf
uint2 edf;
// pair of target_code_index and function_index for intensity
uint2 emission_intensity;
// pair of target_code_index and function_index for volume absorption
uint2 volume_absorption;
// pair of target_code_index and function_index for thin_walled
uint2 thin_walled;
// maps 'material tags' to 'global tags' for the surface scattering distribution function
unsigned int bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int bsdf_mtag_to_gtag_map_size;
// maps 'material tags' to 'global tags' for the emission distribution function
unsigned int edf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int edf_mtag_to_gtag_map_size;
unsigned int contains_hair_bsdf;
};
enum Geometry_type
{
GT_SPHERE = 0, // Intersect a sphere with unit radius located at the (0,0,0)
GT_HAIR = 1, // Intersect an infinite cylinder at (0,0,0) oriented in y-direction
};
struct Kernel_params {
// display
uint2 resolution;
float exposure_scale;
unsigned int *display_buffer;
float3 *accum_buffer;
float3 *albedo_buffer;
float3 *normal_buffer;
bool enable_auxiliary_output;
unsigned display_buffer_index;
// parameters
unsigned int iteration_start;
unsigned int iteration_num;
unsigned int mdl_test_type;
unsigned int max_path_length;
unsigned int use_derivatives;
unsigned int disable_aa;
// camera
float3 cam_pos;
float3 cam_dir;
float3 cam_right;
float3 cam_up;
float cam_focal;
// geometry
unsigned int geometry;
// environment
uint2 env_size;
cudaTextureObject_t env_tex;
Env_accel *env_accel;
float env_intensity; // scaling factor
uint32_t env_gtag; // global light group tag for handle 'env'
float env_rotation; // rotation of the environment
// point light
float3 light_pos;
float3 light_color;
float light_intensity;
uint32_t point_light_gtag; // global light group tag for handle 'point_light'
// material data
Target_code_data *tc_data;
char const **arg_block_list;
unsigned int current_material;
Df_cuda_material *material_buffer;
// LPE state machine
uint32_t lpe_num_states; // number of states in the state machine
uint32_t lpe_num_transitions; // number of possible transitions between 2 states
uint32_t *lpe_state_table; // actual machine; size: #states x #transitions
uint32_t *lpe_final_mask; // encodes final states; size: #states
uint32_t default_gtag; // tag ID for the empty string
uint32_t lpe_ouput_expression; // the LPE evaluated for output
// only one here, but additional one analogously
};
enum Display_buffer_options
{
DISPLAY_BUFFER_LPE = 0,
DISPLAY_BUFFER_ALBEDO,
DISPLAY_BUFFER_NORMAL,
DISPLAY_BUFFER_COUNT
};
#endif // EXAMPLE_DF_CUDA_H

Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// examples/mdl_sdk/shared/texture_support_cuda.h
//
// This file contains the implementations and the vtables of the texture access functions.
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#include <math.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
typedef mi::neuraylib::Texture_handler_base Texture_handler_base;
// Custom structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
explicit Texture()
: filtered_object(0)
, unfiltered_object(0)
, size(make_uint3(0, 0, 0))
, inv_size(make_float3(0.0f, 0.0f, 0.0f))
{}
explicit Texture(
cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object; // uses filter mode cudaFilterModeLinear
cudaTextureObject_t unfiltered_object; // uses filter mode cudaFilterModePoint
uint3 size; // size of the texture, needed for texel access
float3 inv_size; // the inverse values of the size of the texture
};
// Custom structure representing an MDL BSDF measurement.
struct Mbsdf
{
unsigned has_data[2]; // true if there is a measurement for this part
cudaTextureObject_t eval_data[2]; // uses filter mode cudaFilterModeLinear
float max_albedo[2]; // max albedo used to limit the multiplier
float* sample_data[2]; // CDFs for sampling a BSDF measurement
float* albedo_data[2]; // max albedo for each theta (isotropic)
uint2 angular_resolution[2]; // size of the dataset, needed for texel access
float2 inv_angular_resolution[2]; // the inverse values of the size of the dataset
unsigned num_channels[2]; // number of color channels (1 or 3)
};
// Structure representing a Light Profile
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution; // angular resolution of the grid
float2 theta_phi_start; // start of the grid
float2 theta_phi_delta; // angular step size
float2 theta_phi_inv_delta; // inverse step size
float candela_multiplier; // factor to rescale the normalized data
float total_power;
cudaTextureObject_t eval_data; // normalized data sampled on grid
float* cdf_data; // CDFs for sampling a light profile
};
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler : Texture_handler_base {
// additional data for the texture access functions can be provided here
size_t num_textures; // the number of textures used by the material
// (without the invalid texture)
Texture const *textures; // the textures used by the material
// (without the invalid texture)
size_t num_mbsdfs; // the number of mbsdfs used by the material
// (without the invalid mbsdf)
Mbsdf const *mbsdfs; // the mbsdfs used by the material
// (without the invalid mbsdf)
size_t num_lightprofiles; // number of elements in the lightprofiles field
// (without the invalid light profile)
Lightprofile const *lightprofiles; // a device pointer to a list of mbsdfs objects, if used
// (without the invalid light profile)
};
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler_deriv : mi::neuraylib::Texture_handler_deriv_base {
// additional data for the texture access functions can be provided here
size_t num_textures; // the number of textures used by the material
// (without the invalid texture)
Texture const *textures; // the textures used by the material
// (without the invalid texture)
size_t num_mbsdfs; // the number of mbsdfs used by the material
// (without the invalid texture)
Mbsdf const *mbsdfs; // the mbsdfs used by the material
// (without the invalid texture)
size_t num_lightprofiles; // number of elements in the lightprofiles field
// (without the invalid light profile)
Lightprofile const *lightprofiles; // a device pointer to a list of mbsdfs objects, if used
// (without the invalid light profile)
};
#if defined(__CUDACC__)
// Stores a float4 in a float[4] array.
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
// Stores a float in all elements of a float[4] array.
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
// Stores the given float values in a float[4] array.
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
// Stores a float3 in a float[3] array.
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
// Stores a float4 in a float[3] array, ignoring v.w.
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
// Stores a float in all elements of a float[3] array.
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
// Stores the given float values in a float[3] array.
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
// Stores the luminance if a given float[3] in a float.
__device__ inline void store_result1(float* res, float3 const& v)
{
// store luminance
*res = 0.212671 * v.x + 0.715160 * v.y + 0.072169 * v.z;
}
// Stores the luminance if a given float[3] in a float.
__device__ inline void store_result1(float* res, float v0, float v1, float v2)
{
// store luminance
*res = 0.212671 * v0 + 0.715160 * v1 + 0.072169 * v2;
}
// Stores a given float in a float
__device__ inline void store_result1(float* res, float s)
{
*res = s;
}
// ------------------------------------------------------------------------------------------------
// Textures
// ------------------------------------------------------------------------------------------------
// Applies wrapping and cropping to the given coordinate.
// Note: This macro returns if wrap mode is clip and the coordinate is out of range.
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
/* Do nothing, use texture sampler default behavior */ \
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
// Modify texture coordinates to get better texture filtering,
// see http://www.iquilezles.org/www/articles/texture/texture.htm
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
// Implementation of tex::texel_float4() for a texture_2d texture.
// Note: uvtile textures are not supported
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const /*uv_tile*/[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
// Implementation of tex::lookup_float4() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
// Implementation of tex::lookup_float3() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
// Implementation of tex::texel_float4() for a texture_3d texture.
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
// Implementation of tex::lookup_float4() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
// Implementation of tex::lookup_float3() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
// Implementation of resolution_2d function needed by generated code.
// Note: uvtile textures are not supported
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const /*uv_tile*/[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
// invalid texture returns zero
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
// Implementation of resolution_3d function needed by generated code.
// Note: 3d textures are not supported
extern "C" __device__ void tex_resolution_3d(
int result[3],
Texture_handler_base const *self_base,
unsigned texture_idx)
{
// invalid texture returns zero
result[0] = 0;
result[1] = 0;
result[2] = 0;
}
// Implementation of texture_isvalid().
extern "C" __device__ bool tex_texture_isvalid(
Texture_handler_base const *self_base,
unsigned texture_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return texture_idx != 0 && texture_idx - 1 < self->num_textures;
}
// ------------------------------------------------------------------------------------------------
// Light Profiles
// ------------------------------------------------------------------------------------------------
// Implementation of light_profile_power() for a light profile.
extern "C" __device__ float df_light_profile_power(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.total_power;
}
// Implementation of light_profile_maximum() for a light profile.
extern "C" __device__ float df_light_profile_maximum(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.candela_multiplier;
}
// Implementation of light_profile_isvalid() for a light profile.
extern "C" __device__ bool df_light_profile_isvalid(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return light_profile_idx != 0 && light_profile_idx - 1 < self->num_lightprofiles;
}
// binary search through CDF
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
// Implementation of df::light_profile_evaluate() for a light profile.
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
// map theta to 0..1 range
const float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x / float(lp.angular_resolution.x - 1);
// converting input phi from -pi..pi to 0..2pi
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
// floorf wraps phi range into 0..2pi
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
// (phi < 0.0f) is no problem, this is handle by the (black) border
// since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
const float v = phi * lp.theta_phi_inv_delta.y / float(lp.angular_resolution.y - 1);
// wrap_mode: border black would be an alternative (but it produces artifacts at low res)
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
// Implementation of df::light_profile_sample() for a light profile.
extern "C" __device__ void df_light_profile_sample(
float result[3], // output: theta, phi, pdf
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const xi[3]) // uniform random values
{
result[0] = -1.0f; // negative theta means no emission
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
uint2 res = lp.angular_resolution;
// sample theta_out
//-------------------------------------------
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data; // CDF theta
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0); // binary search
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta; // rescale for re-usage
// sample phi_out
//-------------------------------------------
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1) // CDF theta block
+ (idx_theta * (res.y - 1)); // selected CDF for phi
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1); // binary search
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi; // rescale for re-usage
// compute theta and phi
//-------------------------------------------
// sample uniformly within the patch (grid cell)
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
// n = \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
// = 1 / (\cos{\theta_0} - \cos{\theta_1})
//
// \xi = n * \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
// => \cos{\theta} = (1 - \xi) \cos{\theta_0} + \xi \cos{\theta_1}
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
// align phi
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI); // wrap
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1]; // to [-pi, pi]
// compute pdf
//-------------------------------------------
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
// Implementation of df::light_profile_pdf() for a light profile.
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f; // invalid light profile returns zero
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
// CDF data
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
// map theta to 0..1 range
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
// converting input phi from -pi..pi to 0..2pi
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
// floorf wraps phi range into 0..2pi
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
// (phi < 0.0f) is no problem, this is handle by the (black) border
// since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
// wrap_mode: border black would be an alternative (but it produces artifacts at low res)
if (idx_theta < 0 || idx_theta > (res.x - 2) || idx_phi < 0 || idx_phi >(res.x - 2))
return 0.0f;
// get probability for theta
//-------------------------------------------
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
// get probability for phi
//-------------------------------------------
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1) // CDF theta block
+ (idx_theta * (res.y - 1)); // selected CDF for phi
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
// compute probability to select a position in the sphere patch
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cos(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cos(start.x + float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
// ------------------------------------------------------------------------------------------------
// BSDF Measurements
// ------------------------------------------------------------------------------------------------
// Implementation of bsdf_measurement_isvalid() for an MBSDF.
extern "C" __device__ bool df_bsdf_measurement_isvalid(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return bsdf_measurement_index != 0 && bsdf_measurement_index - 1 < self->num_mbsdfs;
}
// Implementation of df::bsdf_measurement_resolution() function needed by generated code,
// which retrieves the angular and chromatic resolution of the given MBSDF.
// The returned triple consists of: number of equi-spaced steps of theta_i and theta_o,
// number of equi-spaced steps of phi, and number of color channels (1 or 3).
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
{
// invalid MBSDF returns zero
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[bsdf_measurement_index - 1];
const unsigned part_index = static_cast<unsigned>(part);
// check for the part
if (bm.has_data[part_index] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
// pass out the information
result[0] = bm.angular_resolution[part_index].x;
result[1] = bm.angular_resolution[part_index].y;
result[2] = bm.num_channels[part_index];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
// assuming each phi is between -pi and pi
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= M_ONE_OVER_PI;
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
// 3D volume on the GPU (phi_delta x theta_out x theta_in)
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
// Implementation of df::bsdf_measurement_evaluate() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
{
// invalid MBSDF returns zero
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
const unsigned part_index = static_cast<unsigned>(part);
// check for the parta
if (bm.has_data[part_index] == 0)
{
store_result3(result, 0.0f);
return;
}
// handle channels
if (bm.num_channels[part_index] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
// Implementation of df::bsdf_measurement_sample() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3], // output: theta, phi, pdf
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi_out[2],
float const xi[3], // uniform random values
Mbsdf_part part)
{
result[0] = -1.0f; // negative theta means absorption
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
return; // invalid MBSDFs returns zero
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return; // check for the part
// CDF data
uint2 res = bm.angular_resolution[part_index];
const float* sample_data = bm.sample_data[part_index];
// compute the theta_in index (flipping input and output, BSDFs are symmetric)
unsigned idx_theta_in = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
idx_theta_in = min(idx_theta_in, res.x - 1);
// sample theta_out
//-------------------------------------------
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_in * res.x;
unsigned idx_theta_out = sample_cdf(cdf_theta, res.x, xi0); // binary search
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta; // rescale for re-usage
// sample phi_out
//-------------------------------------------
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) + // CDF theta block
(idx_theta_in * res.x + idx_theta_out) * res.y; // selected CDF phi
// select which half-circle to choose with probability 0.5
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi_out = sample_cdf(cdf_phi, res.y, xi1); // binary search
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi; // rescale for re-usage
// compute theta and phi out
//-------------------------------------------
const float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi_out) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1]; // phi \in [0, 2pi]
// align phi
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1]; // to [-pi, pi]
// compute pdf
//-------------------------------------------
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
// Implementation of df::bsdf_measurement_pdf() for an MBSDF.
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
return 0.0f; // invalid MBSDF returns zero
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
unsigned part_index = static_cast<unsigned>(part);
// check for the part
if (bm.has_data[part_index] == 0)
return 0.0f;
// CDF data and resolution
const float* sample_data = bm.sample_data[part_index];
uint2 res = bm.angular_resolution[part_index];
// compute indices in the CDF data
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out); // phi_delta, theta_out, theta_in
unsigned idx_theta_in = unsigned(theta_phi_in[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_theta_out = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_phi_out = unsigned(uvw.x * float(res.y));
idx_theta_in = min(idx_theta_in, res.x - 1);
idx_theta_out = min(idx_theta_out, res.x - 1);
idx_phi_out = min(idx_phi_out, res.y - 1);
// get probability to select theta_out
const float* cdf_theta = sample_data + idx_theta_in * res.x;
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
}
// get probability to select phi_out
const float* cdf_phi = sample_data +
(res.x * res.x) + // CDF theta block
(idx_theta_in * res.x + idx_theta_out) * res.y; // selected CDF phi
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
}
// compute probability to select a position in the sphere patch
float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2], // output: max (in case of color) albedo
// for the selected direction ([0]) and
// global ([1])
Texture_handler const *self,
unsigned bsdf_measurement_index,
float const theta_phi[2],
Mbsdf_part part)
{
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_index - 1];
const unsigned part_index = static_cast<unsigned>(part);
// check for the part
if (bm.has_data[part_index] == 0)
return;
const uint2 res = bm.angular_resolution[part_index];
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta = min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_index][idx_theta];
result[1] = bm.max_albedo[part_index];
}
// Implementation of df::bsdf_measurement_albedos() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4], // output: [0] albedo refl. for theta_phi
// [1] max albedo refl. global
// [2] albedo trans. for theta_phi
// [3] max albedo trans. global
Texture_handler_base const *self_base,
unsigned bsdf_measurement_index,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_index == 0 || bsdf_measurement_index - 1 >= self->num_mbsdfs)
return; // invalid MBSDF returns zero
df_bsdf_measurement_albedo(
&result[0],
self,
bsdf_measurement_index,
theta_phi,
df_bsdf_measurement_albedo(
&result[2],
self,
bsdf_measurement_index,
theta_phi,
}
// ------------------------------------------------------------------------------------------------
// Scene data (dummy functions)
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_DUMMY_SCENEDATA
// Implementation of scene_data_isvalid().
extern "C" __device__ bool scene_data_isvalid(
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id)
{
return false;
}
// Implementation of scene_data_lookup_float4().
extern "C" __device__ void scene_data_lookup_float4(
float result[4],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[4],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
// Implementation of scene_data_lookup_float3().
extern "C" __device__ void scene_data_lookup_float3(
float result[3],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_color().
extern "C" __device__ void scene_data_lookup_color(
float result[3],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_float2().
extern "C" __device__ void scene_data_lookup_float2(
float result[2],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value[2],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
}
// Implementation of scene_data_lookup_float().
extern "C" __device__ float scene_data_lookup_float(
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
float const default_value,
bool uniform_lookup)
{
// just return default value
return default_value;
}
// Implementation of scene_data_lookup_int4().
extern "C" __device__ void scene_data_lookup_int4(
int result[4],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int const default_value[4],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
// Implementation of scene_data_lookup_int3().
extern "C" __device__ void scene_data_lookup_int3(
int result[3],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int const default_value[3],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
// Implementation of scene_data_lookup_int2().
extern "C" __device__ void scene_data_lookup_int2(
int result[2],
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int const default_value[2],
bool uniform_lookup)
{
// just return default value
result[0] = default_value[0];
result[1] = default_value[1];
}
// Implementation of scene_data_lookup_int().
extern "C" __device__ int scene_data_lookup_int(
Texture_handler_base const *self_base,
Shading_state_material *state,
unsigned scene_data_id,
int default_value,
bool uniform_lookup)
{
// just return default value
return default_value;
}
// Implementation of scene_data_lookup_float4() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float4(
tct_deriv_arr_float_4 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_4 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float3() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float3(
tct_deriv_arr_float_3 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_3 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_color() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_color(
tct_deriv_arr_float_3 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_3 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float2() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float2(
tct_deriv_arr_float_2 *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_arr_float_2 const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
// Implementation of scene_data_lookup_float() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float(
tct_deriv_float *result,
Texture_handler_base const *self_base,
Shading_state_material_with_derivs *state,
unsigned scene_data_id,
tct_deriv_float const *default_value,
bool uniform_lookup)
{
// just return default value
*result = *default_value;
}
#endif // TEX_SUPPORT_NO_DUMMY_SCENEDATA
// ------------------------------------------------------------------------------------------------
// Vtables
// ------------------------------------------------------------------------------------------------
#ifndef TEX_SUPPORT_NO_VTABLES
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode.
__device__ mi::neuraylib::Texture_handler_vtable tex_vtable = {
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
};
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode with derivatives.
__device__ mi::neuraylib::Texture_handler_deriv_vtable tex_deriv_vtable = {
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_deriv_float,
scene_data_lookup_deriv_float2,
scene_data_lookup_deriv_float3,
scene_data_lookup_deriv_float4,
scene_data_lookup_deriv_color,
};
#endif // TEX_SUPPORT_NO_VTABLES
#endif // __CUDACC__
#endif // TEXTURE_SUPPORT_CUDA_H

Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
// Code shared by CUDA MDL SDK examples
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include <math.h>
#include "example_shared.h"
#include "compiled_material_traverser_base.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
// Structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object; // uses filter mode cudaFilterModeLinear
cudaTextureObject_t unfiltered_object; // uses filter mode cudaFilterModePoint
uint3 size; // size of the texture, needed for texel access
float3 inv_size; // the inverse values of the size of the texture
};
// Structure representing an MDL bsdf measurement.
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2]; // true if there is a measurement for this part
cudaTextureObject_t eval_data[2]; // uses filter mode cudaFilterModeLinear
float max_albedo[2]; // max albedo used to limit the multiplier
float* sample_data[2]; // CDFs for sampling a BSDF measurement
float* albedo_data[2]; // max albedo for each theta (isotropic)
uint2 angular_resolution[2]; // size of the dataset, needed for texel access
float2 inv_angular_resolution[2]; // the inverse values of the size of the dataset
unsigned num_channels[2]; // number of color channels (1 or 3)
};
// Structure representing a Light Profile
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution; // angular resolution of the grid
float2 theta_phi_start; // start of the grid
float2 theta_phi_delta; // angular step size
float2 theta_phi_inv_delta; // inverse step size
float candela_multiplier; // factor to rescale the normalized data
float total_power;
cudaTextureObject_t eval_data; // normalized data sampled on grid
float* cdf_data; // CDFs for sampling a light profile
};
// Structure representing the resources used by the generated code of a target code.
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures; // number of elements in the textures field
CUdeviceptr textures; // a device pointer to a list of Texture objects, if used
size_t num_mbsdfs; // number of elements in the mbsdfs field
CUdeviceptr mbsdfs; // a device pointer to a list of mbsdfs objects, if used
size_t num_lightprofiles; // number of elements in the lightprofiles field
CUdeviceptr lightprofiles; // a device pointer to a list of mbsdfs objects, if used
CUdeviceptr ro_data_segment; // a device pointer to the read-only data segment, if used
};
//------------------------------------------------------------------------------
//
// Helper functions
//
//------------------------------------------------------------------------------
// Return a textual representation of the given value.
template <typename T>
std::string to_string(T val)
{
std::ostringstream stream;
stream << val;
return stream.str();
}
// Collects the handles in a compiled material
class Handle_collector : public Compiled_material_traverser_base
{
public:
// add all handle appearing in the provided material to the collectors handle list.
explicit Handle_collector(
: Compiled_material_traverser_base()
{
traverse(material, transaction);
}
// get the collected handles.
const std::vector<std::string>& get_handles() const { return m_handles; }
private:
// Called when the traversal reaches a new element.
void visit_begin(const mi::neuraylib::ICompiled_material* material,
const Compiled_material_traverser_base::Traversal_element& element,
void* context) override
{
// look for direct calls
if (!element.expression ||
element.expression->get_kind() != mi::neuraylib::IExpression::EK_DIRECT_CALL)
return;
// check if it is a distribution function
auto transaction = static_cast<mi::neuraylib::ITransaction*>(context);
element.expression->get_interface<const mi::neuraylib::IExpression_direct_call
>());
expr_dcall->get_arguments());
expr_dcall->get_definition()));
get_semantic();
if (semantic < mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_FIRST
|| semantic > mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_LAST)
return;
// check if the last argument is a handle
expr_dcall->get_arguments());
mi::Size arg_count = arguments->get_size();
const char* name = arguments->get_name(arg_count - 1);
if (strcmp(name, "handle") != 0)
return;
// get the handle value
arguments->get_expression(arg_count - 1));
if (expr->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
return; // is an error if 'handle' is a reserved parameter name
expr->get_interface<const mi::neuraylib::IExpression_constant>());
const mi::base::Handle<const mi::neuraylib::IValue> value(expr_const->get_value());
if (value->get_kind() != mi::neuraylib::IValue::VK_STRING)
return;
value->get_interface<const mi::neuraylib::IValue_string>());
std::string handle_value = handle->get_value() ? std::string(handle->get_value()) : "";
if (std::find(m_handles.begin(), m_handles.end(), handle_value) == m_handles.end())
m_handles.push_back(handle_value);
}
std::vector<std::string> m_handles;
};
//------------------------------------------------------------------------------
//
// CUDA helper functions
//
//------------------------------------------------------------------------------
// Helper macro. Checks whether the expression is cudaSuccess and if not prints a message and
// resets the device and exits.
#ifdef ENABLE_DEPRECATED_UTILIY_FUNCTIONS
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
#else
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
cudaDeviceReset(); \
exit_failure( "Error in file %s, line %u: \"%s\".\n", __FILE__, __LINE__, #expr); \
} \
} while (false)
#endif
// Initialize CUDA.
CUcontext init_cuda(
int ordinal
#ifdef OPENGL_INTEROP
, const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
// Use first device used by OpenGL context
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
{
// Use given device
check_cuda_success(cuDeviceGet(&cu_device, ordinal));
}
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
// For this example, increase printf CUDA buffer size to support a larger number
// of MDL debug::print() calls per CUDA kernel launch
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
// Uninitialize CUDA.
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
/*compile error*/
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
// No copy possible.
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
T &r = *it;
deleter(r);
}
}
C &operator*() { return m_cont; }
C const &operator*() const { return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
// No copy possible.
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
// Allocate memory on GPU and copy the given data to the allocated memory.
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T>
CUdeviceptr gpu_mem_dup(std::vector<T> const &data)
{
return gpu_mem_dup(&data[0], data.size() * sizeof(T));
}
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
//------------------------------------------------------------------------------
//
// Material_gpu_context class
//
//------------------------------------------------------------------------------
// Helper class responsible for making textures and read-only data available to the GPU
// by generating and managing a list of Target_code_data objects.
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
// Use first entry as "not-used" block
m_target_argument_block_list->push_back(0);
}
// Prepare the needed data of the given target code.
bool prepare_target_code_data(
mi::neuraylib::ITarget_code const *target_code,
std::vector<size_t> const &arg_block_indices);
// Get a device pointer to the target code data list.
CUdeviceptr get_device_target_code_data_list();
// Get a device pointer to the target argument block list.
CUdeviceptr get_device_target_argument_block_list();
// Get a device pointer to the i'th target argument block.
CUdeviceptr get_device_target_argument_block(size_t i)
{
// First entry is the "not-used" block, so start at index 1.
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
// Get the number of target argument blocks.
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
// Get the argument block of the i'th BSDF.
// If the BSDF has no target argument block, size_t(~0) is returned.
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
// Get a writable copy of the i'th target argument block.
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
// Get the layout of the i'th target argument block.
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void update_device_argument_block(size_t i);
private:
// Copy the image data of a canvas to a CUDA array.
void copy_canvas_to_cuda_array(cudaArray_t device_array, mi::neuraylib::ICanvas const *canvas);
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool prepare_texture(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size texture_index,
std::vector<Texture> &textures);
// Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
// functions on the GPU.
bool prepare_mbsdf(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size mbsdf_index,
std::vector<Mbsdf> &mbsdfs);
// Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
// functions on the GPU.
bool prepare_lightprofile(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size lightprofile_index,
std::vector<Lightprofile> &lightprofiles);
// If true, mipmaps will be generated for all 2D textures.
bool m_enable_derivatives;
// The device pointer of the target code data list.
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
// List of all target code data objects owned by this context.
Resource_container<Target_code_data> m_target_code_data_list;
// The device pointer of the target argument block list.
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
// List of all target argument blocks owned by this context.
Resource_container<CUdeviceptr> m_target_argument_block_list;
// List of all local, writable copies of the target argument blocks.
std::vector<mi::base::Handle<mi::neuraylib::ITarget_argument_block> > m_own_arg_blocks;
// List of argument block indices per material BSDF.
std::vector<size_t> m_bsdf_arg_block_indices;
// List of all target argument block layouts.
std::vector<mi::base::Handle<mi::neuraylib::ITarget_value_layout const> > m_arg_block_layouts;
// List of all Texture objects owned by this context.
Resource_container<Texture> m_all_textures;
// List of all MBSDFs objects owned by this context.
Resource_container<Mbsdf> m_all_mbsdfs;
// List of all Light profiles objects owned by this context.
Resource_container<Lightprofile> m_all_lightprofiles;
// List of all CUDA arrays owned by this context.
Resource_container<cudaArray_t> m_all_texture_arrays;
// List of all CUDA mipmapped arrays owned by this context.
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
// Get a device pointer to the target code data list.
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
// Get a device pointer to the target argument block list.
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
// Copy the image data of a canvas to a CUDA array.
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
mi::neuraylib::ICanvas const *canvas)
{
mi::Float32 const *data = static_cast<mi::Float32 const *>(tile->get_data());
check_cuda_success(cudaMemcpy2DToArray(
device_array, 0, 0, data,
canvas->get_resolution_x() * sizeof(float) * 4,
canvas->get_resolution_x() * sizeof(float) * 4,
canvas->get_resolution_y(),
cudaMemcpyHostToDevice));
}
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool Material_gpu_context::prepare_texture(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size texture_index,
std::vector<Texture> &textures)
{
// Get access to the texture data by the texture database name from the target code.
transaction->access<mi::neuraylib::ITexture>(code_ptx->get_texture(texture_index)));
transaction->access<mi::neuraylib::IImage>(texture->get_image()));
mi::Uint32 tex_width = canvas->get_resolution_x();
mi::Uint32 tex_height = canvas->get_resolution_y();
mi::Uint32 tex_layers = canvas->get_layers_size();
char const *image_type = image->get_type();
if (image->is_uvtile()) {
std::cerr << "The example does not support uvtile textures!" << std::endl;
return false;
}
if (canvas->get_tiles_size_x() != 1 || canvas->get_tiles_size_y() != 1) {
std::cerr << "The example does not support tiled images!" << std::endl;
return false;
}
// For simplicity, the texture access functions are only implemented for float4 and gamma
// is pre-applied here (all images are converted to linear space).
// Convert to linear color space if necessary
if (texture->get_effective_gamma() != 1.0f) {
// Copy/convert to float4 canvas and adjust gamma from "effective gamma" to 1.
image_api->convert(canvas.get(), "Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma());
image_api->adjust_gamma(gamma_canvas.get(), 1.0f);
canvas = gamma_canvas;
} else if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0) {
// Convert to expected format
canvas = image_api->convert(canvas.get(), "Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
// Copy image data to GPU array depending on texture shape
code_ptx->get_texture_shape(texture_index);
// Cubemap and 3D texture objects require 3D CUDA arrays
tex_layers != 6) {
std::cerr << "Invalid number of layers (" << tex_layers
<< "), cubemaps must have 6 layers!" << std::endl;
return false;
}
// Allocate a 3D array on the GPU
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
cudaArrayCubemap : 0));
// Prepare the memcpy parameter structure
cudaMemcpy3DParms copy_params;
memset(&copy_params, 0, sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
// Copy the image data of all layers (the layers are not consecutive in memory)
for (mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
canvas->get_tile(0, 0, layer));
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(&copy_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
// mipmapped textures use CUDA mipmapped arrays
mi::Uint32 num_levels = image->get_levels();
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
// create all mipmap levels and copy them to the CUDA arrays in the mipmapped array
mi::base::Handle<mi::IArray> mipmaps(image_api->create_mipmaps(canvas.get(), 1.0f));
for (mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
mi::base::Handle<mi::IPointer> mipmap_ptr(mipmaps->get_element<mi::IPointer>(level - 1));
level_canvas = mipmap_ptr->get_pointer<mi::neuraylib::ICanvas>();
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
// 2D texture objects use CUDA arrays
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
// For cube maps we need clamped address mode to avoid artifacts in the corners
cudaTextureAddressMode addr_mode =
? cudaAddressModeClamp
: cudaAddressModeWrap;
// Create filtered texture object
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f; // default value in OpenGL
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
// Create unfiltered texture object if necessary (cube textures have no texel functions)
cudaTextureObject_t tex_obj_unfilt = 0;
// Use a black border for access outside of the texture
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
// Store texture infos in result vector
textures.push_back(Texture(
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.back());
return true;
}
namespace
{
bool prepare_mbsdfs_part(mi::neuraylib::Mbsdf_part part, Mbsdf& mbsdf_cuda_representation,
const mi::neuraylib::IBsdf_measurement* bsdf_measurement)
{
switch (part)
{
dataset = bsdf_measurement->get_reflection<mi::neuraylib::Bsdf_isotropic_data>();
break;
dataset = bsdf_measurement->get_transmission<mi::neuraylib::Bsdf_isotropic_data>();
break;
}
// no data, fine
if (!dataset)
return true;
// get dimensions
uint2 res;
res.x = dataset->get_resolution_theta();
res.y = dataset->get_resolution_phi();
unsigned num_channels = dataset->get_type() == mi::neuraylib::BSDF_SCALAR ? 1 : 3;
mbsdf_cuda_representation.Add(part, res, num_channels);
// get data
// {1,3} * (index_theta_in * (res_phi * res_theta) + index_theta_out * res_phi + index_phi)
const mi::Float32* src_data = buffer->get_data();
// ----------------------------------------------------------------------------------------
// prepare importance sampling data:
// - for theta_in we will be able to perform a two stage CDF, first to select theta_out,
// and second to select phi_out
// - maximum component is used to "probability" in case of colored measurements
// CDF of the probability to select a certain theta_out for a given theta_in
const unsigned int cdf_theta_size = res.x * res.x;
// for each of theta_in x theta_out combination, a CDF of the probabilities to select a
// a certain theta_out is stored
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x]; // albedo for sampling reflection and transmission
float* sample_data_theta = sample_data; // begin of the first (theta) CDF
float* sample_data_phi = sample_data + cdf_theta_size; // begin of the second (phi) CDFs
const float s_theta = (float) (M_PI * 0.5) / float(res.x); // step size
const float s_phi = (float) (M_PI) / float(res.y); // step size
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
// BSDFs are symmetric: f(w_in, w_out) = f(w_out, w_in)
// take the average of both measurements
// area of two the surface elements (the ones we are averaging)
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
// offset for both the thetas into the measurement data (select row in the volume)
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
// build CDF for phi
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value = fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+ fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else /* num_channels == 1 */
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
// normalize CDF for phi
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
// build CDF for theta
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
// normalize CDF for theta
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
// copy entire CDF data buffer to GPU
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
// ----------------------------------------------------------------------------------------
// prepare evaluation data:
// - simply store the measured data in a volume texture
// - in case of color data, we store each sample in a vector4 to get texture support
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
// make lookup data symmetric
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
// Copy data to GPU array
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>() // float3 is not supported
: cudaCreateChannelDesc<float>());
// Allocate a 3D array on the GPU (phi_delta x theta_out x theta_in)
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
// prepare and copy
cudaMemcpy3DParms copy_params;
memset(&copy_params, 0, sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data), // base pointer
res.y * lookup_channels * sizeof(float), // row pitch
res.y, // width of slice
res.x); // height of slice
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(&copy_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size mbsdf_index,
std::vector<Mbsdf> &mbsdfs)
{
// Get access to the texture data by the texture database name from the target code.
code_ptx->get_bsdf_measurement(mbsdf_index)));
Mbsdf mbsdf_cuda;
// handle reflection and transmission
if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_REFLECTION, mbsdf_cuda, mbsdf.get()))
return false;
if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_TRANSMISSION, mbsdf_cuda, mbsdf.get()))
return false;
mbsdfs.push_back(mbsdf_cuda);
m_all_mbsdfs->push_back(mbsdfs.back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
mi::neuraylib::ITarget_code const *code_ptx,
mi::Size lightprofile_index,
std::vector<Lightprofile> &lightprofiles)
{
// Get access to the texture data by the texture database name from the target code.
code_ptx->get_light_profile(lightprofile_index)));
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
// phi-mayor: [res.x x res.y]
const float* data = lprof_nr->get_data();
// --------------------------------------------------------------------------------------------
// compute total power
// compute inverse CDF data for sampling
// sampling will work on cells rather than grid nodes (used for evaluation)
// first (res.x-1) for the cdf for sampling theta
// rest (rex.x-1) * (res.y-1) for the individual cdfs for sampling phi (after theta)
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
// area of the patch (grid cell)
// \mu = int_{theta0}^{theta1} sin{theta} \delta theta
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
// build CDF for phi
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
// the probability to select a patch corresponds to the value times area
// the value of a cell is the average of the corners
// omit the *1/4 as we normalize in the end
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
// normalize CDF for phi
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
// build CDF for theta
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
// normalize CDF for theta
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
// copy entire CDF data buffer to GPU
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
// --------------------------------------------------------------------------------------------
// prepare evaluation data
// - use a 2d texture that allows bilinear interpolation
// Copy data to GPU array
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
// 2D texture objects use CUDA arrays
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpy2DToArray(
device_lightprofile_data, 0, 0, data,
res.x * sizeof(float), res.x * sizeof(float), res.y, cudaMemcpyHostToDevice));
// Create filtered texture object
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
lightprofiles.push_back(lprof);
m_all_lightprofiles->push_back(lightprofiles.back());
return true;
}
// Prepare the needed target code data of the given target code.
bool Material_gpu_context::prepare_target_code_data(
mi::neuraylib::ITarget_code const *target_code,
std::vector<size_t> const &arg_block_indices)
{
// Target code data list may not have been retrieved already
check_success(m_device_target_code_data_list.get() == 0);
// Handle the read-only data segments if necessary.
// They are only created, if the "enable_ro_segment" backend option was set to "on".
CUdeviceptr device_ro_data = 0;
if (target_code->get_ro_data_segment_count() > 0) {
device_ro_data = gpu_mem_dup(
target_code->get_ro_data_segment_data(0),
target_code->get_ro_data_segment_size(0));
}
// Copy textures to GPU if the code has more than just the invalid texture
CUdeviceptr device_textures = 0;
mi::Size num_textures = target_code->get_texture_count();
if (num_textures > 1) {
std::vector<Texture> textures;
// Loop over all textures skipping the first texture,
// which is always the invalid texture
for (mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
// Copy texture list to GPU
device_textures = gpu_mem_dup(textures);
}
// Copy MBSDFs to GPU if the code has more than just the invalid mbsdf
CUdeviceptr device_mbsdfs = 0;
mi::Size num_mbsdfs = target_code->get_bsdf_measurement_count();
if (num_mbsdfs > 1) {
std::vector<Mbsdf> mbsdfs;
// Loop over all mbsdfs skipping the first mbsdf,
// which is always the invalid mbsdf
for (mi::Size i = 1; i < num_mbsdfs; ++i) {
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
// Copy mbsdf list to GPU
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
// Copy light profiles to GPU if the code has more than just the invalid light profile
CUdeviceptr device_lightprofiles = 0;
mi::Size num_lightprofiles = target_code->get_light_profile_count();
if (num_lightprofiles > 1) {
std::vector<Lightprofile> lightprofiles;
// Loop over all profiles skipping the first profile,
// which is always the invalid profile
for (mi::Size i = 1; i < num_lightprofiles; ++i) {
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
// Copy light profile list to GPU
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
for (mi::Size i = 0, num = target_code->get_argument_block_count(); i < num; ++i) {
target_code->get_argument_block(i));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_own_arg_blocks.push_back(mi::base::make_handle(arg_block->clone()));
m_arg_block_layouts.push_back(
}
for (size_t arg_block_index : arg_block_indices) {
m_bsdf_arg_block_indices.push_back(arg_block_index);
}
return true;
}
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
//------------------------------------------------------------------------------
//
// MDL material compilation code
//
//------------------------------------------------------------------------------
class Material_compiler {
public:
// Constructor.
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
const std::string& df_handle_mode);
// Loads an MDL module and returns the module DB.
std::string load_module(const std::string& mdl_module_name);
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
// If class_compilation is true, the material will use class compilation.
bool add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* fname,
bool class_compilation=false);
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
// If class_compilation is true, the material will use class compilation.
bool add_material_df(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* base_fname,
bool class_compilation=false);
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_descriptions. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool add_material(
const std::string& qualified_module_name,
const std::string& material_simple_name,
mi::neuraylib::Target_function_description* function_descriptions,
mi::Size description_count,
bool class_compilation);
// Generates CUDA PTX target code for the current link unit.
typedef std::vector<mi::base::Handle<mi::neuraylib::IMaterial_definition const> >
Material_definition_list;
// Get the list of used material definitions.
// There will be one entry per add_* call.
Material_definition_list const &get_material_defs()
{
return m_material_defs;
}
typedef std::vector<mi::base::Handle<mi::neuraylib::ICompiled_material const> >
Compiled_material_list;
// Get the list of compiled materials.
// There will be one entry per add_* call.
Compiled_material_list const &get_compiled_materials()
{
return m_compiled_materials;
}
std::vector<size_t> const &get_argument_block_indices() const {
return m_arg_block_indexes;
}
const std::vector<std::string>& get_handles() const {
return m_handles;
}
private:
// Creates an instance of the given material.
mi::neuraylib::IMaterial_instance* create_material_instance(
const std::string& qualified_module_name,
const std::string& material_simple_name);
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material* compile_material_instance(
bool class_compilation);
private:
Material_definition_list m_material_defs;
Compiled_material_list m_compiled_materials;
std::vector<size_t> m_arg_block_indexes;
std::vector<std::string> m_handles;
};
// Constructor.
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
const std::string& df_handle_mode)
: m_mdl_impexp_api(mdl_impexp_api, mi::base::DUP_INTERFACE)
, m_be_cuda_ptx(mdl_backend_api->get_backend(mi::neuraylib::IMdl_backend_api::MB_CUDA_PTX))
, m_mdl_factory(mdl_factory, mi::base::DUP_INTERFACE)
, m_transaction(transaction, mi::base::DUP_INTERFACE)
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
// Option "enable_ro_segment": Default is disabled.
// If you have a lot of big arrays, enabling this might speed up compilation.
// check_success(m_be_cuda_ptx->set_option("enable_ro_segment", "on") == 0);
if (enable_derivatives) {
// Option "texture_runtime_with_derivs": Default is disabled.
// We enable it to get coordinates with derivatives for texture lookup functions.
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
// Option "tex_lookup_call_mode": Default mode is vtable mode.
// You can switch to the slower vtable mode by commenting out the next line.
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
// Option "num_texture_results": Default is 0.
// Set the size of a renderer provided array for texture results in the MDL SDK state in number
// of float4 elements processed by the init() function.
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
if (enable_auxiliary) {
// Option "enable_auxiliary": Default is disabled.
// We enable it to create an additional 'auxiliary' function that can be called on each
// distribution function to fill an albedo and normal buffer e.g. for denoising.
check_success(m_be_cuda_ptx->set_option("enable_auxiliary", "on") == 0);
}
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
// Option "enable_df_interpreter": Default is disabled.
// Using the interpreter allows to reuse the same code for multiple materials
// reducing code divergence, if your scene shows many materials at the same time.
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
// Option "df_handle_slot_mode": Default is "none".
// When using light path expressions, individual parts of the distribution functions can be
// selected using "handles". The contribution of each of those parts has to be evaluated during
// rendering. This option controls how many parts are evaluated with each call into the
// generated "evaluate" and "auxiliary" functions and how the data is passed.
// The CUDA backend supports pointers, which means an externally managed buffer of arbitrary
// size is used to transport the contributions of each part.
check_success(m_be_cuda_ptx->set_option("df_handle_slot_mode", df_handle_mode.c_str()) == 0);
// Option "scene_data_names": Default is "".
// Uncomment the line below to enable calling the scene data runtime functions
// for any scene data names or specify a comma-separated list of names for which
// you may provide scene data. The example runtime functions always return the
// default values, which is the same as not supporting any scene data.
// m_be_cuda_ptx->set_option("scene_data_names", "*");
// force experimental to true for now
m_context->set_option("experimental", true);
m_context->set_option("fold_ternary_on_df", fold_ternary_on_df);
// After we set the options, we can create the link unit
m_link_unit = mi::base::make_handle(m_be_cuda_ptx->create_link_unit(transaction, m_context.get()));
}
std::string Material_compiler::load_module(const std::string& mdl_module_name)
{
// load module
m_mdl_impexp_api->load_module(m_transaction.get(), mdl_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get()))
exit_failure("Failed to load module: %s", mdl_module_name.c_str());
// get and return the DB name
m_mdl_factory->get_db_module_name(mdl_module_name.c_str()));
return db_module_name->get_c_str();
}
// Creates an instance of the given material.
mi::neuraylib::IMaterial_instance* Material_compiler::create_material_instance(
const std::string& qualified_module_name,
const std::string& material_simple_name)
{
// Load mdl module.
m_mdl_impexp_api->load_module(
m_transaction.get(), qualified_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get())) {
// module has errors
return nullptr;
}
// get db name
m_mdl_factory->get_db_module_name(qualified_module_name.c_str()));
std::string material_db_name =
std::string(module_db_name->get_c_str()) + "::" + material_simple_name;
// Create a material instance from the material definition
// with the default arguments.
m_transaction->access<mi::neuraylib::IMaterial_definition>(
material_db_name.c_str()));
if (!material_definition) {
// material with given name does not exists
print_message(
(
"Material '" +
material_simple_name +
"' does not exists in '" +
qualified_module_name + "'").c_str());
return nullptr;
}
m_material_defs.push_back(material_definition);
mi::Sint32 result;
material_definition->create_material_instance(0, &result));
check_success(result == 0);
material_instance->retain();
return material_instance.get();
}
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material *Material_compiler::compile_material_instance(
bool class_compilation)
{
mi::Uint32 flags = class_compilation
material_instance->create_compiled_material(flags, m_context.get()));
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
compiled_material->retain();
return compiled_material.get();
}
// Generates CUDA PTX target code for the current link unit.
mi::base::Handle<const mi::neuraylib::ITarget_code> Material_compiler::generate_cuda_ptx()
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
std::cout << "Dumping CUDA PTX code:\n\n"
<< code_cuda_ptx->get_code() << std::endl;
#endif
return code_cuda_ptx;
}
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* fname,
bool class_compilation)
{
mi::neuraylib::Target_function_description desc;
desc.path = path;
desc.base_fname = fname;
add_material(qualified_module_name, material_simple_name, &desc, 1, class_compilation);
return desc.return_code == 0;
}
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_df(
const std::string& qualified_module_name,
const std::string& material_simple_name,
const char* path,
const char* base_fname,
bool class_compilation)
{
mi::neuraylib::Target_function_description desc;
desc.path = path;
desc.base_fname = base_fname;
add_material(qualified_module_name, material_simple_name, &desc, 1, class_compilation);
return desc.return_code == 0;
}
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_description. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool Material_compiler::add_material(
const std::string& qualified_module_name,
const std::string& material_simple_name,
mi::neuraylib::Target_function_description* function_descriptions,
mi::Size description_count,
bool class_compilation)
{
if (description_count == 0)
return false;
// Load the given module and create a material instance
create_material_instance(qualified_module_name, material_simple_name));
if (!material_instance)
return false;
// Compile the material instance in instance compilation mode
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
// Note: the same argument_block_index is filled into all function descriptions of a
// material, if any function uses it
m_arg_block_indexes.push_back(function_descriptions[0].argument_block_index);
return print_messages(m_context.get());
}
//------------------------------------------------------------------------------
//
// Material execution code
//
//------------------------------------------------------------------------------
// Helper function to create PTX source code for a non-empty 32-bit value array.
void print_array_u32(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
// PTX does not allow empty arrays, so use a dummy entry
str += "1] = { 0 };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
// Helper function to create PTX source code for a non-empty function pointer array.
void print_array_func(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
// PTX does not allow empty arrays, so use a dummy entry
str += "1] = { dummy_func };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
// Generate PTX array containing the references to all generated functions.
std::string generate_func_array_ptx(
const std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > &target_codes)
{
// Create PTX header and mdl_expr_functions_count constant
std::string src =
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
// Workaround needed to let CUDA linker resolve the function pointers in the arrays.
// Also used for "empty" function arrays.
src += ".func dummy_func() { ret; }\n";
std::string tc_offsets;
std::string function_names;
std::string tc_indices;
std::string ab_indices;
unsigned f_count = 0;
// Iterate over all target codes
for (size_t tc_index = 0, num = target_codes.size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
// in case of multiple target codes, we need to address the functions by a pair of
// target_code_index and function_index.
// the elements in the resulting function array can then be index by offset + func_index.
if(!tc_offsets.empty())
tc_offsets += ", ";
tc_offsets += to_string(f_count);
// Collect all names and prototypes of callable functions within the current target code
for (size_t func_index = 0, func_count = target_code->get_callable_function_count();
func_index < func_count; ++func_index)
{
// add to function list
if (!tc_indices.empty())
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
// target code index in case of multiple link units
tc_indices += to_string(tc_index);
// name of the function
function_names += target_code->get_callable_function(func_index);
// Get argument block index and translate to 1 based list index (-> 0 = not-used)
mi::Size ab_index = target_code->get_callable_function_argument_block_index(func_index);
ab_indices += to_string(ab_index == mi::Size(~0) ? 0 : (ab_index + 1));
f_count++;
// Add prototype declaration
src += target_code->get_callable_function_prototype(
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
// infos per target code (link unit)
src += std::string(".visible .const .align 4 .u32 mdl_target_code_count = ")
+ to_string(target_codes.size()) + ";\n";
print_array_u32(
src, std::string("mdl_target_code_offsets"), unsigned(target_codes.size()), tc_offsets);
// infos per function
src += std::string(".visible .const .align 4 .u32 mdl_functions_count = ")
+ to_string(f_count) + ";\n";
print_array_func(src, std::string("mdl_functions"), f_count, function_names);
print_array_u32(src, std::string("mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src, std::string("mdl_target_code_indices"), f_count, tc_indices);
return src;
}
// Build a linked CUDA kernel containing our kernel and all the generated code, making it
// available to the kernel via an added "mdl_expr_functions" array.
CUmodule build_linked_kernel(
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > const &target_codes,
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
// Generate PTX array containing the references to all generated functions.
// The linker will resolve them to addresses.
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
std::cout << "Dumping CUDA PTX code for the \"mdl_expr_functions\" array:\n\n"
<< ptx_func_array_src << std::endl;
#endif
// Link all generated code, our generated PTX array and our kernel together
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
// Setup the linker
// Pass a buffer for info messages
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
// Pass the size of the info buffer
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
// Pass a buffer for error messages
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
// Pass the size of the error buffer
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
// Add all code generated by the MDL PTX backend
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
// Add the "mdl_expr_functions" array PTX module
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.c_str()),
ptx_func_array_src.size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
// Add our kernel
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
// Link everything to a cubin
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
std::cerr << "PTX linker error:\n" << error_log << std::endl;
check_cuda_success(link_result);
}
std::cout << "CUDA link completed." << std::endl;
if (info_log[0])
std::cout << "Linker output:\n" << info_log << std::endl;
// Load the result and get the entrypoint of our kernel
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(&regs, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
std::cout << "Kernel uses " << regs << " registers and " << lmem << " lmem and has a size of "
<< linked_cubin_size << " bytes." << std::endl;
// Cleanup
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif // EXAMPLE_CUDA_SHARED_H

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cu

/******************************************************************************
* Copyright 2020 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <curand_kernel.h>
#define _USE_MATH_DEFINES
#include <math.h>
#include "example_df_cuda.h"
#include "texture_support_cuda.h"
// To reuse this sample code for the MDL SDK and MDL Core the corresponding namespaces are used.
// when this CUDA code is used in the context of an SDK sample.
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
// when this CUDA code is used in the context of an Core sample.
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
// for LPE support there different options for the renderer, for CUDA a renderer provided buffer
// can be used to retrieve the contributions of the individual handles (named lobes)
// Note, a real renderer would go for one specific option only
#define DF_HSM_POINTER -2
#define DF_HSM_NONE -1
#define DF_HSM_FIXED_1 1
#define DF_HSM_FIXED_2 2
#define DF_HSM_FIXED_4 4
#define DF_HSM_FIXED_8 8
// this is the one that is used,
// Note, this has to match with code backend option "df_handle_slot_mode"
#define DF_HANDLE_SLOTS DF_HSM_POINTER
// If enabled, math::DX(state::texture_coordinates(0).xy) = float2(1, 0) and
// math::DY(state::texture_coordinates(0).xy) = float2(0, 1) will be used.
// #define USE_FAKE_DERIVATIVES
#ifdef ENABLE_DERIVATIVES
typedef Material_expr_function_with_derivs Mat_expr_func;
typedef Bsdf_init_function_with_derivs Bsdf_init_func;
typedef Bsdf_sample_function_with_derivs Bsdf_sample_func;
typedef Bsdf_evaluate_function_with_derivs Bsdf_evaluate_func;
typedef Bsdf_pdf_function_with_derivs Bsdf_pdf_func;
typedef Bsdf_auxiliary_function_with_derivs Bsdf_auxiliary_func;
typedef Edf_init_function_with_derivs Edf_init_func;
typedef Edf_sample_function_with_derivs Edf_sample_func;
typedef Edf_evaluate_function_with_derivs Edf_evaluate_func;
typedef Edf_pdf_function_with_derivs Edf_pdf_func;
typedef Edf_auxiliary_function_with_derivs Edf_auxiliary_func;
typedef Shading_state_material_with_derivs Mdl_state;
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Material_expr_function Mat_expr_func;
typedef Bsdf_init_function Bsdf_init_func;
typedef Bsdf_sample_function Bsdf_sample_func;
typedef Bsdf_evaluate_function Bsdf_evaluate_func;
typedef Bsdf_pdf_function Bsdf_pdf_func;
typedef Bsdf_auxiliary_function Bsdf_auxiliary_func;
typedef Edf_init_function Edf_init_func;
typedef Edf_sample_function Edf_sample_func;
typedef Edf_evaluate_function Edf_evaluate_func;
typedef Edf_pdf_function Edf_pdf_func;
typedef Edf_auxiliary_function Edf_auxiliary_func;
typedef Shading_state_material Mdl_state;
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
// Custom structure representing the resources used by the generated code of a target code object.
struct Target_code_data
{
size_t num_textures; // number of elements in the textures field
Texture *textures; // a list of Texture objects, if used
size_t num_mbsdfs; // number of elements in the mbsdfs field
Mbsdf *mbsdfs; // a list of mbsdfs objects, if used
size_t num_lightprofiles; // number of elements in the lightprofiles field
Lightprofile *lightprofiles; // a list of lightprofiles objects, if used
char const *ro_data_segment; // the read-only data segment, if used
};
// all function types
union Mdl_function_ptr
{
Mat_expr_func *expression;
Bsdf_init_func *bsdf_init;
Bsdf_sample_func *bsdf_sample;
Bsdf_evaluate_func *bsdf_evaluate;
Bsdf_pdf_func *bsdf_pdf;
Bsdf_auxiliary_func *bsdf_auxiliary;
Edf_init_func *edf_init;
Edf_sample_func *edf_sample;
Edf_evaluate_func *edf_evaluate;
Edf_pdf_func *edf_pdf;
Edf_auxiliary_func *edf_auxiliary;
};
// function index offset depending on the target code
extern __constant__ unsigned int mdl_target_code_offsets[];
// number of generated functions
extern __constant__ unsigned int mdl_functions_count;
// the following arrays are indexed by an mdl_function_index
extern __constant__ Mdl_function_ptr mdl_functions[];
extern __constant__ unsigned int mdl_arg_block_indices[];
// Identity matrix.
// The last row is always implied to be (0, 0, 0, 1).
__constant__ const float4 identity[3] = {
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
// the material provides pairs for each generated function to evaluate
// the functions and arg blocks array are indexed by:
// mdl_target_code_offsets[target_code_index] + function_index
typedef uint3 Mdl_function_index;
__device__ inline Mdl_function_index get_mdl_function_index(const uint2& index_pair)
{
return make_uint3(
index_pair.x, // target_code_index
index_pair.y, // function_index inside target code
mdl_target_code_offsets[index_pair.x] + index_pair.y); // global function index
}
// resource handler for accessing textures and other data
// depends on the target code (link unit)
struct Mdl_resource_handler
{
__device__ Mdl_resource_handler()
{
m_tex_handler.vtable = &TEX_VTABLE; // only required in 'vtable' mode, otherwise NULL
data.shared_data = NULL;
data.texture_handler = reinterpret_cast<Texture_handler_base *>(&m_tex_handler);
}
// reuse the handler with a different target code index
__device__ inline void set_target_code_index(
const Kernel_params& params, const Mdl_function_index& index)
{
m_tex_handler.num_textures = params.tc_data[index.x].num_textures;
m_tex_handler.textures = params.tc_data[index.x].textures;
m_tex_handler.num_mbsdfs = params.tc_data[index.x].num_mbsdfs;
m_tex_handler.mbsdfs = params.tc_data[index.x].mbsdfs;
m_tex_handler.num_lightprofiles = params.tc_data[index.x].num_lightprofiles;
m_tex_handler.lightprofiles = params.tc_data[index.x].lightprofiles;
}
// a pointer to this data is passed to all generated functions
private:
Tex_handler m_tex_handler;
};
// checks if the indexed function can be evaluated or not
__device__ inline bool is_valid(const Mdl_function_index& index)
{
return index.y != 0xFFFFFFFFu;
}
// get a pointer to the material parameters which is passed to all generated functions
__device__ inline const char* get_arg_block(
const Kernel_params& params,
const Mdl_function_index& index)
{
return params.arg_block_list[mdl_arg_block_indices[index.z]];
}
// restores the normal since the BSDF init will change it
// sets the read-only data segment pointer for large arrays
__device__ inline void prepare_state(
const Kernel_params& params,
const Mdl_function_index& index,
Mdl_state& state,
const tct_float3& normal)
{
state.ro_data_segment = params.tc_data[index.x].ro_data_segment;
state.normal = normal;
}
// Expression functions
__device__ inline Mat_expr_func* as_expression(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].expression;
}
// BSDF functions
__device__ inline Bsdf_init_func* as_bsdf_init(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].bsdf_init;
}
__device__ inline Bsdf_sample_func* as_bsdf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].bsdf_sample;
}
__device__ inline Bsdf_evaluate_func* as_bsdf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].bsdf_evaluate;
}
__device__ inline Bsdf_pdf_func* as_bsdf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].bsdf_pdf;
}
__device__ inline Bsdf_auxiliary_func* as_bsdf_auxiliary(const Mdl_function_index& index)
{
return mdl_functions[index.z + 4].bsdf_auxiliary;
}
// EDF functions
__device__ inline Edf_init_func* as_edf_init(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].edf_init;
}
__device__ inline Edf_sample_func* as_edf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].edf_sample;
}
__device__ inline Edf_evaluate_func* as_edf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].edf_evaluate;
}
__device__ inline Edf_pdf_func* as_edf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].edf_pdf;
}
__device__ inline Edf_auxiliary_func* as_edf_auxiliary(const Mdl_function_index& index)
{
return mdl_functions[index.z + 4].edf_auxiliary;
}
// 3d vector math utilities
__device__ inline float3 operator+(const float3& a, const float3& b)
{
return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
__device__ inline float3 operator-(const float3& a, const float3& b)
{
return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
__device__ inline float3 operator*(const float3& a, const float3& b)
{
return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
__device__ inline float3 operator*(const float3& a, const float s)
{
return make_float3(a.x * s, a.y * s, a.z * s);
}
__device__ inline float3 operator/(const float3& a, const float s)
{
return make_float3(a.x / s, a.y / s, a.z / s);
}
__device__ inline void operator+=(float3& a, const float3& b)
{
a.x += b.x; a.y += b.y; a.z += b.z;
}
__device__ inline void operator-=(float3& a, const float3& b)
{
a.x -= b.x; a.y -= b.y; a.z -= b.z;
}
__device__ inline void operator*=(float3& a, const float3& b)
{
a.x *= b.x; a.y *= b.y; a.z *= b.z;
}
__device__ inline void operator*=(float3& a, const float& s)
{
a.x *= s; a.y *= s; a.z *= s;
}
__device__ inline float squared_length(const float3 &d)
{
return d.x * d.x + d.y * d.y + d.z * d.z;
}
__device__ inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f / sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
__device__ inline float dot(const float3 &u, const float3 &v)
{
return u.x * v.x + u.y * v.y + u.z * v.z;
}
__device__ inline float3 cross(const float3 &u, const float3 &v)
{
return make_float3(
u.y * v.z - u.z * v.y,
u.z * v.x - u.x * v.z,
u.x * v.y - u.y * v.x);
}
typedef curandStatePhilox4_32_10_t Rand_state;
// direction to environment map texture coordinates
__device__ inline float2 environment_coords(const float3 &dir, const Kernel_params& params)
{
const float u = atan2f(dir.z, dir.x) * (float)(0.5 / M_PI) + 0.5f;
const float v = acosf(fmax(fminf(-dir.y, 1.0f), -1.0f)) * (float)(1.0 / M_PI);
return make_float2(fmodf(u + params.env_rotation * 0.5f * M_1_PI, 1.0f), v);
}
// importance sample the environment
__device__ inline float3 environment_sample(
float3 &dir,
float &pdf,
const float3 &xi,
const Kernel_params &params)
{
// importance sample an envmap pixel using an alias map
const unsigned int size = params.env_size.x * params.env_size.y;
const unsigned int idx = min((unsigned int)(xi.x * (float)size), size - 1);
unsigned int env_idx;
float xi_y = xi.y;
if (xi_y < params.env_accel[idx].q) {
env_idx = idx ;
xi_y /= params.env_accel[idx].q;
} else {
env_idx = params.env_accel[idx].alias;
xi_y = (xi_y - params.env_accel[idx].q) / (1.0f - params.env_accel[idx].q);
}
const unsigned int py = env_idx / params.env_size.x;
const unsigned int px = env_idx % params.env_size.x;
pdf = params.env_accel[env_idx].pdf;
// uniformly sample spherical area of pixel
const float u = (float)(px + xi_y) / (float)params.env_size.x;
const float phi = u * (float)(2.0 * M_PI) - (float)M_PI - params.env_rotation;
float sin_phi, cos_phi;
sincosf(phi > float(-M_PI) ? phi : (phi + (float)(2.0 * M_PI)), &sin_phi, &cos_phi);
const float step_theta = (float)M_PI / (float)params.env_size.y;
const float theta0 = (float)(py) * step_theta;
const float cos_theta = cosf(theta0) * (1.0f - xi.z) + cosf(theta0 + step_theta) * xi.z;
const float theta = acosf(cos_theta);
const float sin_theta = sinf(theta);
dir = make_float3(cos_phi * sin_theta, -cos_theta, sin_phi * sin_theta);
// lookup filtered beauty
const float v = theta * (float)(1.0 / M_PI);
const float4 t = tex2D<float4>(params.env_tex, u, v);
return make_float3(t.x, t.y, t.z) * params.env_intensity / pdf;
}
// evaluate the environment
__device__ inline float3 environment_eval(
float &pdf,
const float3 &dir,
const Kernel_params &params)
{
const float2 uv = environment_coords(dir, params);
const unsigned int x =
min((unsigned int)(uv.x * (float)params.env_size.x), params.env_size.x - 1);
const unsigned int y =
min((unsigned int)(uv.y * (float)params.env_size.y), params.env_size.y - 1);
pdf = params.env_accel[y * params.env_size.x + x].pdf;
const float4 t = tex2D<float4>(params.env_tex, uv.x, uv.y) ;
return make_float3(t.x, t.y, t.z) * params.env_intensity;
}
//-------------------------------------------------------------------------------------------------
struct auxiliary_data
{
float3 albedo;
float3 normal;
int num; // multiple elements can contribute to the aux buffer with equal weight
__device__ inline auxiliary_data& operator+=(const auxiliary_data& b)
{
albedo += b.albedo;
normal += b.normal;
num += b.num;
return *this;
}
};
__device__ inline static void clear(auxiliary_data& data)
{
data.albedo = make_float3(0.0f, 0.0f, 0.0f);
data.normal = make_float3(0.0f, 0.0f, 0.0f);
data.num = 0;
}
__device__ inline void normalize(auxiliary_data& data)
{
data.albedo = data.albedo / fmaxf(1.0f, float(data.num));
if (dot(data.normal, data.normal) > 0.0f)
data.normal = normalize(data.normal);
data.num = min(1, data.num);
}
//-------------------------------------------------------------------------------------------------
struct Ray_state
{
float3 contribution;
float3 weight;
float3 pos, pos_rx, pos_ry;
float3 dir, dir_rx, dir_ry;
bool inside;
int intersection;
uint32_t lpe_current_state;
auxiliary_data* aux;
};
struct Ray_hit_info
{
float distance;
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 position;
#else
float3 position;
#endif
float3 normal;
float3 tangent_u;
float3 tangent_v;
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 texture_coords[1];
#else
tct_float3 texture_coords[1];
#endif
};
#define GT_SPHERE_RADIUS 1.0f
__device__ inline bool intersect_sphere(
const Ray_state &ray_state,
const Kernel_params &params,
Ray_hit_info& out_hit)
{
const float r = GT_SPHERE_RADIUS;
const float b = 2.0f * dot(ray_state.dir, ray_state.pos);
const float c = dot(ray_state.pos, ray_state.pos) - r * r;
float tmp = b * b - 4.0f * c;
if (tmp < 0.0f)
return false;
tmp = sqrtf(tmp);
const float t0 = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t1 = c / t0;
const float m = fminf(t0, t1);
out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
if (out_hit.distance < 0.0f)
return false;
// compute geometry state
#ifdef ENABLE_DERIVATIVES
out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
const float3 &posval = out_hit.position.val;
#else
out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
const float3 &posval = out_hit.position;
#endif
out_hit.normal = normalize(posval);
const float phi = atan2f(out_hit.normal.x, out_hit.normal.z);
const float theta = acosf(out_hit.normal.y);
const float3 uvw = make_float3(
(phi * (float) (0.5 / M_PI) + 0.5f) * 2.0f,
1.0f - theta * (float) (1.0 / M_PI),
0.0f);
// compute surface derivatives
float sp, cp;
sincosf(phi, &sp, &cp);
const float st = sinf(theta);
out_hit.tangent_u = make_float3(cp * st, 0.0f, -sp * st) * (float) M_PI * r;
out_hit.tangent_v = make_float3(sp * out_hit.normal.y, -st, cp * out_hit.normal.y) * (float) (-M_PI) * r;
#ifdef ENABLE_DERIVATIVES
out_hit.texture_coords[0].val = uvw;
out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
#else
out_hit.texture_coords[0] = uvw;
#endif
return true;
}
#define GT_HAIR_RADIUS 0.35f
#define GT_HAIR_LENGTH 3.0f
__device__ inline bool intersect_hair(
const Ray_state &ray_state,
const Kernel_params &params,
Ray_hit_info& out_hit)
{
const float r = GT_HAIR_RADIUS;
const float a = ray_state.dir.x * ray_state.dir.x + ray_state.dir.z * ray_state.dir.z;
const float b = 2.0f * (ray_state.dir.x * ray_state.pos.x + ray_state.dir.z * ray_state.pos.z);
const float c = ray_state.pos.x * ray_state.pos.x + ray_state.pos.z * ray_state.pos.z - r * r;
float tmp = b * b - 4.0f * a * c;
if (tmp < 0.0f)
return false;
tmp = sqrtf(tmp);
const float q = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t0 = q / a;
const float t1 = c / q;
const float m = fminf(t0, t1);
out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
if (out_hit.distance < 0.0f)
return false;
// compute geometry state
#ifdef ENABLE_DERIVATIVES
out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
const float3 &posval = out_hit.position.val;
#else
out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
const float3 &posval = out_hit.position;
#endif
out_hit.normal = normalize(make_float3(posval.x, 0.0f, posval.z));
if (fabsf(posval.y) > GT_HAIR_LENGTH * 0.5f)
return false;
const float phi = atan2f(posval.z, posval.x);
const float3 uvw = make_float3(
(posval.y + GT_HAIR_LENGTH * 0.5f) / GT_HAIR_LENGTH, // position along the hair
phi * (float) (0.5f / M_PI) + 0.5f, // position around the hair in the range [0, 1]
2.0f * GT_HAIR_RADIUS); // thickness of the hair
// compute surface derivatives
out_hit.tangent_u = make_float3(0.0, 1.0, 0.0);
out_hit.tangent_v = cross(out_hit.normal, out_hit.tangent_u);
#ifdef ENABLE_DERIVATIVES
out_hit.texture_coords[0].val = uvw;
out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
#else
out_hit.texture_coords[0] = uvw;
#endif
return true;
}
__device__ inline bool intersect_geometry(
Ray_state &ray_state,
const Kernel_params &params,
Ray_hit_info& out_hit)
{
switch (Geometry_type(params.geometry))
{
case GT_SPHERE:
if (!intersect_sphere(ray_state, params, out_hit))
return false;
break;
case GT_HAIR:
if (!intersect_hair(ray_state, params, out_hit))
return false;
break;
default:
return false;
}
#ifndef ENABLE_DERIVATIVES
ray_state.pos = out_hit.position;
#else
ray_state.pos = out_hit.position.val;
if (params.use_derivatives && ray_state.intersection == 0)
{
#ifdef USE_FAKE_DERIVATIVES
out_hit.position.dx = make_float3(1.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 1.0f, 0.0f);
out_hit.texture_coords[0].dx = make_float3(1.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 1.0f, 0.0f);
#else
// compute ray differential for one-pixel offset rays
// ("Physically Based Rendering", 3rd edition, chapter 10.1.1)
const float d = dot(out_hit.normal, ray_state.pos);
const float tx = (d - dot(out_hit.normal, ray_state.pos_rx)) / dot(out_hit.normal, ray_state.dir_rx);
const float ty = (d - dot(out_hit.normal, ray_state.pos_ry)) / dot(out_hit.normal, ray_state.dir_ry);
ray_state.pos_rx += ray_state.dir_rx * tx;
ray_state.pos_ry += ray_state.dir_ry * ty;
out_hit.position.dx = ray_state.pos_rx - ray_state.pos;
out_hit.position.dy = ray_state.pos_ry - ray_state.pos;
float4 A;
float2 B_x, B_y;
if (fabsf(out_hit.normal.x) > fabsf(out_hit.normal.y) && fabsf(out_hit.normal.x) > fabsf(out_hit.normal.z))
{
B_x = make_float2(
ray_state.pos_rx.y - ray_state.pos.y,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.y - ray_state.pos.y,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
out_hit.tangent_u.y, out_hit.tangent_u.z, out_hit.tangent_v.y, out_hit.tangent_v.z);
}
else if (fabsf(out_hit.normal.y) > fabsf(out_hit.normal.z))
{
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
out_hit.tangent_u.x, out_hit.tangent_u.z, out_hit.tangent_v.x, out_hit.tangent_v.z);
}
else
{
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.y - ray_state.pos.y);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.y - ray_state.pos.y);
A = make_float4(
out_hit.tangent_u.x, out_hit.tangent_u.y, out_hit.tangent_v.x, out_hit.tangent_v.y);
}
const float det = A.x * A.w - A.y * A.z;
if (fabsf(det) > 1e-10f)
{
const float inv_det = 1.0f / det;
out_hit.texture_coords[0].dx.x = inv_det * (A.w * B_x.x - A.z * B_x.y);
out_hit.texture_coords[0].dx.y = inv_det * (A.x * B_x.y - A.y * B_x.x);
out_hit.texture_coords[0].dy.x = inv_det * (A.w * B_y.x - A.z * B_y.y);
out_hit.texture_coords[0].dy.y = inv_det * (A.x * B_y.y - A.y * B_y.x);
}
#endif
}
#endif
out_hit.tangent_u = normalize(out_hit.tangent_u);
out_hit.tangent_v = normalize(out_hit.tangent_v);
return true;
}
__device__ bool cull_point_light(
const Kernel_params &params,
const float3 &light_position,
const float3 &light_direction /*to light*/,
const float3 &normal)
{
switch (params.geometry)
{
case GT_SPHERE:
{
// same as default, but allow lights inside the sphere
const float inside = (squared_length(light_position) < GT_SPHERE_RADIUS) ? -1.f : 1.f;
return (dot(light_direction, normal) * inside) <= 0.0f;
}
case GT_HAIR:
// ignore light sources within the volume
return (light_position.x * light_position.x +
light_position.z * light_position.z) < GT_SPHERE_RADIUS;
default:
// ignore light from behind
return dot(light_direction, normal) <= 0.0f;
}
}
__device__ bool cull_env_light(
const Kernel_params &params,
const float3 &light_direction /*to light*/,
const float3 &normal)
{
switch (params.geometry)
{
case GT_HAIR:
// allow light from behind
return false;
case GT_SPHERE:
default:
// ignore light from behind
return dot(light_direction, normal) <= 0.0f;
}
}
__device__ void continue_ray(
Ray_state& ray_state,
const Ray_hit_info &hit_indo,
unsigned int event_type,
const Kernel_params &params)
{
switch (params.geometry)
{
case GT_HAIR:
if (event_type == BSDF_EVENT_GLOSSY_TRANSMISSION)
{
// conservative
ray_state.pos += ray_state.dir * 2.0f * GT_HAIR_RADIUS;
ray_state.inside = false;
}
break;
default:
return;
}
}
//-------------------------------------------------------------------------------------------------
// events that are define a transition between states, along with tag IDs
enum Transition_type
{
TRANSITION_CAMERA = 0,
TRANSITION_LIGHT,
TRANSITION_EMISSION,
TRANSITION_SCATTER_DR,
TRANSITION_SCATTER_DT,
TRANSITION_SCATTER_GR,
TRANSITION_SCATTER_GT,
TRANSITION_SCATTER_SR,
TRANSITION_SCATTER_ST,
TRANSITION_COUNT,
};
// go to the next state, given the current state and a transition token.
__device__ inline uint32_t lpe_transition(
uint32_t current_state,
Transition_type event,
uint32_t global_tag_id,
const Kernel_params &params)
{
if(current_state == static_cast<uint32_t>(-1))
return static_cast<uint32_t>(-1);
return params.lpe_state_table[
current_state * params.lpe_num_transitions +
static_cast<uint32_t>(TRANSITION_COUNT) * global_tag_id +
static_cast<uint32_t>(event)];
}
// add direct contribution, e.g., for emission, direct light hits
__device__ inline void accumulate_contribution(
Transition_type light_event,
uint32_t light_global_tag_id,
const float3& contrib,
Ray_state &ray_state,
const Kernel_params &params)
{
// check if there is a valid transition to that light source
uint32_t next_state = lpe_transition(
ray_state.lpe_current_state, light_event, light_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// add contribution the when the reached state is a final state for the selected LPE
// here we only have one LPE buffer, but more can be added easily by checking different LPEs
if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
ray_state.contribution += contrib;
}
// add contribution for next event estimations
__device__ inline void accumulate_next_event_contribution(
Transition_type scatter_event, uint32_t material_global_tag_id,
Transition_type light_event, uint32_t light_global_tag_id,
const float3& contrib,
Ray_state &ray_state,
const Kernel_params &params)
{
// transition following the scatter event
uint32_t next_state = lpe_transition(
ray_state.lpe_current_state, scatter_event, material_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// check if there is a valid transition to the light source
next_state = lpe_transition(
next_state, light_event, light_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// add contribution the when the reached state is a final state for the selected LPE
// here we only have one LPE buffer, but more can be added easily by checking different LPEs
if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
ray_state.contribution += contrib;
}
//-------------------------------------------------------------------------------------------------
__device__ inline bool trace_scene(
Rand_state &rand_state,
Ray_state &ray_state,
const Kernel_params &params)
{
// stop at invalid states
if (ray_state.lpe_current_state == static_cast<uint32_t>(-1))
return false;
// intersect with geometry
Ray_hit_info hit;
if (!intersect_geometry(ray_state, params, hit)) {
if (ray_state.intersection == 0 && params.mdl_test_type != MDL_TEST_NO_ENV) {
// primary ray miss, add environment contribution
const float2 uv = environment_coords(ray_state.dir, params);
const float4 texval = tex2D<float4>(params.env_tex, uv.x, uv.y);
// add contribution, if `CL` is a valid path
accumulate_contribution(
TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
make_float3(texval.x, texval.y, texval.z) * params.env_intensity,
ray_state, params);
}
return false;
}
float4 texture_results[16];
// material of the current object
Df_cuda_material material = params.material_buffer[params.current_material];
// access textures and other resource data
Mdl_resource_handler mdl_resources;
// create state
Mdl_state state = {
hit.normal,
hit.normal,
hit.position,
0.0f,
hit.texture_coords,
&hit.tangent_u,
&hit.tangent_v,
texture_results,
NULL,
identity,
identity,
0,
1.0f
};
// for evaluating parts of the BSDF individually, e.g. for implementing LPEs
// the MDL SDK provides several options to pass out the BSDF, EDF, and auxiliary data
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
// application provided memory
// the data structs will get only a pointer to a buffer, along with size and offset
const unsigned df_eval_slots = 4; // number of handles (parts) that can be evaluated
// at once. 4 is an arbitrary choice. However, it
// has to match eval_data.handle_count and
// aux_data.handle_count)
float3 result_buffer_0[df_eval_slots]; // used for bsdf_diffuse, edf, and albedo
float3 result_buffer_1[df_eval_slots]; // used for bsdf_specular and normal
#elif DF_HANDLE_SLOTS == DF_HSM_NONE
// handles are ignored, all parts of the BSDF are returned at once without loops (fastest)
const unsigned df_eval_slots = 1;
#else
// eval_data and auxiliary_data have a fixed size array to pass the data. Only an offset
// is required if there are more handles (parts) than slots.
const unsigned df_eval_slots = DF_HANDLE_SLOTS;
#endif
Mdl_function_index func_idx;
// apply volume attenuation after first bounce
// (assuming uniform absorption coefficient and ignoring scattering coefficient)
if (ray_state.intersection > 0)
{
func_idx = get_mdl_function_index(material.volume_absorption);
if (is_valid(func_idx)) {
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
prepare_state(params, func_idx, state, hit.normal); // init state
float3 abs_coeff;
as_expression(func_idx)(
&abs_coeff, &state, &mdl_resources.data, NULL, arg_block);
ray_state.weight.x *= abs_coeff.x > 0.0f ? expf(-abs_coeff.x * hit.distance) : 1.0f;
ray_state.weight.y *= abs_coeff.y > 0.0f ? expf(-abs_coeff.y * hit.distance) : 1.0f;
ray_state.weight.z *= abs_coeff.z > 0.0f ? expf(-abs_coeff.z * hit.distance) : 1.0f;
}
}
// add emission
func_idx = get_mdl_function_index(material.edf);
if (is_valid(func_idx))
{
// evaluate intensity expression
float3 emission_intensity = make_float3(0.0, 0.0, 0.0);
Mdl_function_index intensity_func_idx = get_mdl_function_index(material.emission_intensity);
if (is_valid(intensity_func_idx))
{
// init for the use of the materials emission intensity
mdl_resources.set_target_code_index(params, intensity_func_idx); // init resource handler
const char* arg_block = get_arg_block(params, intensity_func_idx); // get material parameters
prepare_state(params, intensity_func_idx, state, hit.normal); // init state
as_expression(intensity_func_idx)(
&emission_intensity, &state, &mdl_resources.data, NULL, arg_block);
}
// init for the use of the materials EDF
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
prepare_state(params, func_idx, state, hit.normal); // init state
as_edf_init(func_idx)(&state, &mdl_resources.data, NULL, arg_block);
// evaluate EDF
Edf_evaluate_data<(Df_handle_slot_mode) DF_HANDLE_SLOTS> eval_data;
eval_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.edf = result_buffer_0;
eval_data.handle_count = df_eval_slots;
#endif
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < material.edf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials EDF
as_edf_evaluate(func_idx)(&eval_data, &state, &mdl_resources.data, NULL, arg_block);
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < material.edf_mtag_to_gtag_map_size); ++lobe)
{
// add emission contribution
accumulate_contribution(
TRANSITION_EMISSION, material.edf_mtag_to_gtag_map[offset + lobe],
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.edf * emission_intensity,
#else
eval_data.edf[lobe] * emission_intensity,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
func_idx = get_mdl_function_index(material.bsdf);
if (is_valid(func_idx))
{
// init for the use of the materials BSDF
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
prepare_state(params, func_idx, state, hit.normal); // init state
// initialize BSDF
// Note, that this will change the state.normal (needs to be reset before using EDFs)
as_bsdf_init(func_idx)(&state, &mdl_resources.data, NULL, arg_block);
// reuse memory for function data
union
{
Bsdf_sample_data sample_data;
Bsdf_evaluate_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS> eval_data;
Bsdf_pdf_data pdf_data;
Bsdf_auxiliary_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS> aux_data;
};
// for thin_walled materials there is no 'inside'
bool thin_walled = false;
Mdl_function_index thin_walled_func_idx = get_mdl_function_index(material.thin_walled);
if (is_valid(thin_walled_func_idx))
as_expression(thin_walled_func_idx)(
&thin_walled, &state, &mdl_resources.data, NULL, arg_block);
// initialize shared fields
if (ray_state.inside && !thin_walled)
{
sample_data.ior1.x = BSDF_USE_MATERIAL_IOR;
sample_data.ior2 = make_float3(1.0f, 1.0f, 1.0f);
}
else
{
sample_data.ior1 = make_float3(1.0f, 1.0f, 1.0f);
sample_data.ior2.x = BSDF_USE_MATERIAL_IOR;
}
sample_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
// if requested, fill auxiliary buffers
if (params.enable_auxiliary_output && ray_state.intersection == 0)
{
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
aux_data.albedo = result_buffer_0;
aux_data.normal = result_buffer_1;
aux_data.handle_count = df_eval_slots;
#endif
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < material.bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
aux_data.handle_offset = offset;
#endif
// evaluate the materials auxiliary
as_bsdf_auxiliary(func_idx)(&aux_data, &state, &mdl_resources.data, NULL, arg_block);
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < material.bsdf_mtag_to_gtag_map_size); ++lobe)
{
// to keep it simpler, the individual albedo and normals are averaged
// however, the parts can also be used separately, e.g. for LPEs
#if DF_HANDLE_SLOTS == DF_HSM_NONE
ray_state.aux->albedo += aux_data.albedo;
ray_state.aux->normal += aux_data.normal;
#else
ray_state.aux->albedo += aux_data.albedo[lobe];
ray_state.aux->normal += aux_data.normal[lobe];
#endif
ray_state.aux->num++;
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
// compute direct lighting for point light
Transition_type transition_glossy, transition_diffuse;
if (params.light_intensity > 0.0f)
{
float3 to_light = params.light_pos - ray_state.pos;
if(!cull_point_light(params, params.light_pos, to_light, hit.normal))
{
const float inv_squared_dist = 1.0f / squared_length(to_light);
const float3 f = params.light_color * params.light_intensity *
inv_squared_dist * (float) (0.25 / M_PI);
eval_data.k2 = to_light * sqrtf(inv_squared_dist);
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.bsdf_diffuse = result_buffer_0;
eval_data.bsdf_glossy = result_buffer_1;
eval_data.handle_count = df_eval_slots;
#endif
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < material.bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials BSDF
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, NULL, arg_block);
// we know if we reflect or transmit
if (dot(to_light, hit.normal) > 0.0f) {
transition_glossy = TRANSITION_SCATTER_GR;
transition_diffuse = TRANSITION_SCATTER_DR;
} else {
transition_glossy = TRANSITION_SCATTER_GT;
transition_diffuse = TRANSITION_SCATTER_DT;
}
// sample weight
const float3 w = ray_state.weight * f;
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < material.bsdf_mtag_to_gtag_map_size); ++lobe)
{
// get the `global tag` of the lobe
unsigned material_lobe_gtag = material.bsdf_mtag_to_gtag_map[offset + lobe];
// add diffuse contribution
accumulate_next_event_contribution(
transition_diffuse, material_lobe_gtag,
TRANSITION_LIGHT, params.point_light_gtag, // light group
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_diffuse * w,
#else
eval_data.bsdf_diffuse[lobe] * w,
#endif
ray_state, params);
// add glossy contribution
accumulate_next_event_contribution(
transition_glossy, material_lobe_gtag,
TRANSITION_LIGHT, params.point_light_gtag, // light group
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_glossy * w,
#else
eval_data.bsdf_glossy[lobe] * w,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
}
// importance sample environment light
if (params.mdl_test_type != MDL_TEST_SAMPLE && params.mdl_test_type != MDL_TEST_NO_ENV)
{
const float xi0 = curand_uniform(&rand_state);
const float xi1 = curand_uniform(&rand_state);
const float xi2 = curand_uniform(&rand_state);
float3 light_dir;
float pdf;
const float3 f = environment_sample(light_dir, pdf, make_float3(xi0, xi1, xi2), params);
if (!cull_env_light(params, light_dir, hit.normal) && pdf > 0.0f)
{
eval_data.k2 = light_dir;
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.bsdf_diffuse = result_buffer_0;
eval_data.bsdf_glossy = result_buffer_1;
eval_data.handle_count = df_eval_slots;
#endif
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < material.bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials BSDF
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, NULL, arg_block);
const float mis_weight =
(params.mdl_test_type == MDL_TEST_EVAL) ? 1.0f : pdf / (pdf + eval_data.pdf);
// we know if we reflect or transmit
if (dot(light_dir, hit.normal) > 0.0f) {
transition_glossy = TRANSITION_SCATTER_GR;
transition_diffuse = TRANSITION_SCATTER_DR;
} else {
transition_glossy = TRANSITION_SCATTER_GT;
transition_diffuse = TRANSITION_SCATTER_DT;
}
// sample weight
const float3 w = ray_state.weight * f * mis_weight;
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < material.bsdf_mtag_to_gtag_map_size); ++lobe)
{
// get the `global tag` of the lobe
unsigned material_lobe_gtag = material.bsdf_mtag_to_gtag_map[offset + lobe];
// add diffuse contribution
accumulate_next_event_contribution(
transition_diffuse, material_lobe_gtag,
TRANSITION_LIGHT, params.env_gtag, // light group 'env'
#if DF_HANDLE_SLOTS == DF_HSM_NONE
(eval_data.bsdf - eval_data.bsdf_glossy) * w,
#else
eval_data.bsdf_diffuse[lobe] * w,
#endif
ray_state, params);
// add glossy contribution
accumulate_next_event_contribution(
transition_glossy, material_lobe_gtag,
TRANSITION_LIGHT, params.env_gtag, // light group 'env'
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_glossy * w,
#else
eval_data.bsdf_glossy[lobe] * w,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
}
// importance sample BSDF
{
sample_data.xi.x = curand_uniform(&rand_state);
sample_data.xi.y = curand_uniform(&rand_state);
sample_data.xi.z = curand_uniform(&rand_state);
sample_data.xi.w = curand_uniform(&rand_state);
// sample the materials BSDF
as_bsdf_sample(func_idx)(&sample_data, &state, &mdl_resources.data, NULL, arg_block);
if (sample_data.event_type == BSDF_EVENT_ABSORB)
return false;
ray_state.dir = sample_data.k2;
ray_state.weight *= sample_data.bsdf_over_pdf;
const bool transmission = (sample_data.event_type & BSDF_EVENT_TRANSMISSION) != 0;
if (transmission)
ray_state.inside = !ray_state.inside;
const bool is_specular = (sample_data.event_type & BSDF_EVENT_SPECULAR) != 0;
Transition_type next;
if (is_specular)
next = transmission ? TRANSITION_SCATTER_ST : TRANSITION_SCATTER_SR;
else if ((sample_data.event_type & BSDF_EVENT_DIFFUSE) != 0)
next = transmission ? TRANSITION_SCATTER_DT : TRANSITION_SCATTER_DR;
else
next = transmission ? TRANSITION_SCATTER_GT : TRANSITION_SCATTER_GR;
// move ray to the next sampled state
ray_state.lpe_current_state = lpe_transition(
ray_state.lpe_current_state, next,
#if DF_HANDLE_SLOTS == DF_HSM_NONE
// ill-defined case: the LPE machine expects tags but the renderer ignores them
// -> the resulting image of LPEs with tags is undefined in this case
params.default_gtag,
#else
material.bsdf_mtag_to_gtag_map[sample_data.handle], // sampled lobe
#endif
params);
// depending on the geometry, the ray might be displaced before continuing
continue_ray(ray_state, hit, sample_data.event_type, params);
if (ray_state.inside)
{
// avoid self-intersections
ray_state.pos -= hit.normal * 0.001f;
return true; // continue bouncing in sphere
}
else if (params.mdl_test_type != MDL_TEST_NO_ENV &&
params.mdl_test_type != MDL_TEST_EVAL)
{
// leaving sphere, add contribution from environment hit
float pdf;
const float3 f = environment_eval(pdf, sample_data.k2, params);
float bsdf_pdf;
if (params.mdl_test_type == MDL_TEST_MIS_PDF)
{
const float3 k2 = sample_data.k2;
pdf_data.k2 = k2;
// get pdf corresponding to the materials BSDF
as_bsdf_pdf(func_idx)(&pdf_data, &state, &mdl_resources.data, NULL, arg_block);
bsdf_pdf = pdf_data.pdf;
}
else
bsdf_pdf = sample_data.pdf;
if (is_specular || bsdf_pdf > 0.0f)
{
const float mis_weight = is_specular ||
(params.mdl_test_type == MDL_TEST_SAMPLE) ? 1.0f :
bsdf_pdf / (pdf + bsdf_pdf);
float3 specular_contrib = ray_state.weight * f * mis_weight;
accumulate_contribution(
TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
specular_contrib,
ray_state, params);
}
}
}
}
return false;
}
struct render_result
{
float3 beauty;
auxiliary_data aux;
};
__device__ inline render_result render_scene(
Rand_state &rand_state,
const Kernel_params &params,
const unsigned x,
const unsigned y)
{
const float inv_res_x = 1.0f / (float)params.resolution.x;
const float inv_res_y = 1.0f / (float)params.resolution.y;
const float dx = params.disable_aa ? 0.5f : curand_uniform(&rand_state);
const float dy = params.disable_aa ? 0.5f : curand_uniform(&rand_state);
const float2 screen_pos = make_float2(
((float)x + dx) * inv_res_x,
((float)y + dy) * inv_res_y);
const float r = (2.0f * screen_pos.x - 1.0f);
const float r_rx = (2.0f * (screen_pos.x + inv_res_x) - 1.0f);
const float u = (2.0f * screen_pos.y - 1.0f);
const float u_ry = (2.0f * (screen_pos.y + inv_res_y) - 1.0f);
const float aspect = (float)params.resolution.y / (float)params.resolution.x;
render_result res;
clear(res.aux);
Ray_state ray_state;
ray_state.contribution = make_float3(0.0f, 0.0f, 0.0f);
ray_state.weight = make_float3(1.0f, 1.0f, 1.0f);
ray_state.pos = ray_state.pos_rx = ray_state.pos_ry = params.cam_pos;
ray_state.dir = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u);
ray_state.dir_rx = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r_rx + params.cam_up * aspect * u);
ray_state.dir_ry = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u_ry);
ray_state.inside = false;
ray_state.lpe_current_state = 1; // already at the camera so state 0 to 1 is free as long as
// there is only one camera
ray_state.aux = &res.aux;
const unsigned int max_inters = params.max_path_length - 1;
for (ray_state.intersection = 0; ray_state.intersection < max_inters; ++ray_state.intersection)
{
if (!trace_scene(rand_state, ray_state, params))
break;
}
res.beauty =
isfinite(ray_state.contribution.x) &&
isfinite(ray_state.contribution.y) &&
isfinite(ray_state.contribution.z) ? ray_state.contribution : make_float3(0.0f, 0.0f, 0.0f);
normalize(res.aux);
return res;
}
// quantize + gamma
__device__ inline unsigned int float3_to_rgba8(float3 val)
{
const unsigned int r = (unsigned int) (255.0 * powf(saturate(val.x), 1.0f / 2.2f));
const unsigned int g = (unsigned int) (255.0 * powf(saturate(val.y), 1.0f / 2.2f));
const unsigned int b = (unsigned int) (255.0 * powf(saturate(val.z), 1.0f / 2.2f));
return 0xff000000 | (b << 16) | (g << 8) | r;
}
// exposure + simple Reinhard tonemapper + gamma
__device__ inline unsigned int display(float3 val, const float tonemap_scale)
{
val *= tonemap_scale;
const float burn_out = 0.1f;
val.x *= (1.0f + val.x * burn_out) / (1.0f + val.x);
val.y *= (1.0f + val.y * burn_out) / (1.0f + val.y);
val.z *= (1.0f + val.z * burn_out) / (1.0f + val.z);
return float3_to_rgba8(val);
}
// CUDA kernel rendering simple geometry with IBL
extern "C" __global__ void render_scene_kernel(
const Kernel_params kernel_params)
{
const unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= kernel_params.resolution.x || y >= kernel_params.resolution.y)
return;
const unsigned int idx = y * kernel_params.resolution.x + x;
Rand_state rand_state;
const unsigned int num_dim = kernel_params.disable_aa ? 6 : 8; // 2 camera, 3 BSDF, 3 environment
curand_init(idx, /*subsequence=*/0, kernel_params.iteration_start * num_dim, &rand_state);
render_result res;
float3 beauty = make_float3(0.0f, 0.0f, 0.0f);
auxiliary_data aux;
clear(aux);
for (unsigned int s = 0; s < kernel_params.iteration_num; ++s)
{
res = render_scene(
rand_state,
kernel_params,
x, y);
beauty += res.beauty;
aux += res.aux;
}
beauty *= 1.0f / (float)kernel_params.iteration_num;
normalize(aux);
// accumulate
if (kernel_params.iteration_start == 0) {
kernel_params.accum_buffer[idx] = beauty;
if (kernel_params.enable_auxiliary_output) {
kernel_params.albedo_buffer[idx] = aux.albedo;
kernel_params.normal_buffer[idx] = aux.normal;
}
} else {
float iteration_weight = (float) kernel_params.iteration_num /
(float) (kernel_params.iteration_start + kernel_params.iteration_num);
float3 buffer_val = kernel_params.accum_buffer[idx] +
(beauty - kernel_params.accum_buffer[idx]) * iteration_weight;
kernel_params.accum_buffer[idx] =
(isinf(buffer_val.x) || isnan(buffer_val.y) || isinf(buffer_val.z) ||
isnan(buffer_val.x) || isinf(buffer_val.y) || isnan(buffer_val.z))
? make_float3(0.0f, 0.0f, 1.0e+30f)
: buffer_val;
if (kernel_params.enable_auxiliary_output) {
// albedo
kernel_params.albedo_buffer[idx] = kernel_params.albedo_buffer[idx] +
(aux.albedo - kernel_params.albedo_buffer[idx]) * iteration_weight;
// normal, check for zero length first
float3 weighted_normal = kernel_params.normal_buffer[idx] +
(aux.normal - kernel_params.normal_buffer[idx]) * iteration_weight;
if (dot(weighted_normal, weighted_normal) > 0.0f)
weighted_normal = normalize(weighted_normal);
kernel_params.normal_buffer[idx] = weighted_normal;
}
}
// update display buffer
if (kernel_params.display_buffer)
{
switch (kernel_params.enable_auxiliary_output ? kernel_params.display_buffer_index : 0)
{
case 1: /* albedo */
kernel_params.display_buffer[idx] = float3_to_rgba8(kernel_params.albedo_buffer[idx]);
break;
case 2: /* normal */
{
float3 display_normal = kernel_params.normal_buffer[idx];
if (dot(display_normal, display_normal) > 0) {
display_normal.x = display_normal.x * 0.5f + 0.5f;
display_normal.y = display_normal.y * 0.5f + 0.5f;
display_normal.z = display_normal.z * 0.5f + 0.5f;
}
kernel_params.display_buffer[idx] = float3_to_rgba8(display_normal);
break;
}
default: /* beauty */
kernel_params.display_buffer[idx] =
display(kernel_params.accum_buffer[idx], kernel_params.exposure_scale);
break;
}
}
}
[Previous] [Up] [Next]