This example shows how to enable and use automatic derivatives for texture filtering.

New Topics

Automatic derivatives

Detailed Description

Automatic derivatives

When rendering images with high-frequency content, like a checker board texture, you usually have to shoot an increased number of rays to avoid aliasing effects. By providing information to the texture lookup functions about the area covered by a ray, the integral over this area can be calculated (or approximated) to get good texture sampling with just a single ray.

In the MDL SDK, the renderer can provide the derivatives of texture coordinates with respect to x and y in screen space, when calling generated code for MDL expressions or distribution functions. The generated code then calculates the derivatives of expressions provided to 2D texture lookup functions as texture coordinate parameters using "Automatic Differentiation" (see Dan Piponi, "Automatic Differentiation, C++ Templates, and Photogrammetry", Journal of Graphics, GPU, and Game Tools (JGT), Vol 9, No 4, pp. 41-55 (2004)). The renderer-provided texture runtime can finally use texture filtering techniques like elliptically weighted average (EWA) to determine the texture result.

Automatic derivatives are supported by the CUDA, LLVM IR and native backends of the MDL SDK. To enable this feature, you need to:

set the backend option "texture_runtime_with_derivs" to "on" via the IMdl_backend::set_option() method
replace Shading_state_material by Shading_state_material_with_derivs and provide the derivatives of the texture coordinates
provide a texture runtime supporting "tex_lookup_deriv_float4_2d" and "tex_lookup_deriv_float3_2d" (for the native backend, there is a simple isotropic implementation using mipmaps, if you use the default "on" value for the "use_builtin_resource_handler" option)

In the "example_df_cuda" example, the backend option is set in the constructor of the Material_compiler class in example_cuda_shared.h, when the "-d" option is provided via command line:

check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);

In this mode, the example will use a variant of example_df_cuda.cu compiled with a "ENABLE_DERIVATIVES" define, which will enable calculating the texture coordinate derivatives on the sphere. It does so by intersecting two additional rays, offset by one-pixel in x and y direction, with the plane given by the primary intersection point and the corresponding normal. Using the surface derivatives with respect to U and V and the new intersection points, it determines the derivatives of the texture coordinates with respect to screen-space x and y. See Matt Pharr et al., "Physically Based Rendering", 3rd edition (2016), chapter 10.1.1 for details.

Instead of the usual Shading_state_material, the example then fills in the derivative variant Shading_state_material_with_derivs where the texture coordinates are tct_deriv_float3 values consisting of val, the texture coordinate, dx and dy, the derivative of the texture coordinate with respect to screen-space x and screen-space y, respectively.

The types could be written as:

struct tct_deriv_float3 {
    tct_float3  val;    // value component
    tct_float3  dx;     // derivative of value with respect to screen-space x
    tct_float3  dy;     // derivative of value with respect to screen-space y
};
 
struct Shading_state_material_with_derivs {
    tct_float3               normal;             // state::normal() result
    tct_float3               geom_normal;        // state::geom_normal() result
    tct_float3               position;           // state::position() result
    tct_float                animation_time;     // state::animation_time() result
    const tct_deriv_float3  *text_coords;        // state::texture_coordinate() table
    const tct_float3        *tangent_u;          // state::texture_tangent_u() table
    const tct_float3        *tangent_v;          // state::texture_tangent_v() table
    tct_float4              *text_results;       // texture results lookup table
    const char              *ro_data_segment;    // read-only data segment
    const tct_float4        *world_to_object;    // world-to-object transform matrix
    const tct_float4        *object_to_world;    // object-to-world transform matrix
    tct_int                  object_id;          // state::object_id() result
};

Note: For the native backend, you have to cast the Shading_state_material_with_derivs object to a Shading_state_material reference when calling the "execute_*" methods of mi::neuraylib::ITarget_code, as these methods have not been duplicated for the derivative variants.

Additionally, a different vtable "tex_deriv_vtable" defined in texture_support_cuda.h is provided for the texture handler, which contains derivative variants for the 2D texture lookup functions: The functions "tex_lookup_deriv_float4_2d" and "tex_lookup_deriv_float3_2d" expect a "tct_deriv_float2 const *" as "coord" parameter:

struct tct_deriv_float2 {
    tct_float2  val;    // value component
    tct_float2  dx;     // derivative of value with respect to screen-space x
    tct_float2  dy;     // derivative of value with respect to screen-space y
};
 
extern "C" __device__ void tex_lookup_deriv_float3_2d(
    float                       result[3],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    tct_deriv_float2 const     *coord,
    Tex_wrap_mode const         wrap_u,
    Tex_wrap_mode const         wrap_v,
    float const                 crop_u[2],
    float const                 crop_v[2]);
 
extern "C" __device__ void tex_lookup_deriv_float4_2d(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    tct_deriv_float2 const     *coord,
    Tex_wrap_mode const         wrap_u,
    Tex_wrap_mode const         wrap_v,
    float const                 crop_u[2],
    float const                 crop_v[2]);

The example texture runtime uses the CUDA tex2DGrad() functions to perform anisotropic hardware filtering. The derivatives provided to the texture lookup handlers can directly be used for those functions. The mipmaps are generated by Material_gpu_context::prepare_texture() in example_cuda_shared.h via the IImage_api::create_mipmaps() function.

Example Source

To compile the source code, you require CUDA, GLFW, and GLEW. For detailed instructions, please refer to the Getting Started section.

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cpp

/******************************************************************************
 * Copyright 2024 NVIDIA Corporation. All rights reserved.
 *****************************************************************************/
 
// examples/mdl_sdk/df_cuda/example_df_cuda.cpp
//
// Simple renderer using compiled BSDFs with a material parameter editor GUI.
 
#include <chrono>
#include <iostream>
#include <string>
#include <vector>
#include <list>
#include <map>
#include <memory>
#define _USE_MATH_DEFINES
#include <math.h>
 
// shared example helpers
#include "example_df_cuda.h"
#include "lpe.h"
 
// Enable this to dump the generated PTX code to files in the current directory.
// #define DUMP_PTX
 
#define OPENGL_INTEROP
#include "example_cuda_shared.h"
 
#include <imgui.h>
#include <imgui_impl_glfw.h>
#include <imgui_impl_opengl3.h>
#include <GL/glew.h>
#include <GLFW/glfw3.h>
 
#define GL_DISPLAY_CUDA
#include "utils/gl_display.h"
#include "utils/profiling.h"
 
using namespace mi::examples::profiling;
 
#define terminate()          \
    do {                     \
        glfwTerminate();     \
        exit_failure();      \
    } while (0)
 
#define WINDOW_TITLE "MDL SDK DF CUDA Example"
 
 
// Vector helper functions //
 
inline float length(const float3 &d)
{
    return sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
}
 
inline float3 normalize(const float3 &d)
{
    const float inv_len = 1.0f / length(d);
    return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
 
inline float3 operator/(const float3& d, float s)
{
    const float inv_s = 1.0f / s;
    return make_float3(d.x * inv_s, d.y * inv_s, d.z * inv_s);
}
 
 
// OpenGL code //
 
// Initialize OpenGL and create a window with an associated OpenGL context.
static GLFWwindow *init_opengl(std::string& version_string, int res_x, int res_y)
{
    // Initialize GLFW
    check_success(glfwInit());
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
    version_string = "#version 330 core"; // see top comments in 'imgui_impl_opengl3.cpp'
 
    // Create an OpenGL window and a context
    GLFWwindow *window = glfwCreateWindow(
        res_x, res_y, WINDOW_TITLE, nullptr, nullptr);
    if (!window) {
        std::cerr << "Error creating OpenGL window!" << std::endl;
        terminate();
    }
 
    // Attach context to window
    glfwMakeContextCurrent(window);
 
    // Initialize GLEW to get OpenGL extensions
    GLenum res = glewInit();
    if (res != GLEW_OK) {
        std::cerr << "GLEW error: " << glewGetErrorString(res) << std::endl;
        terminate();
    }
 
    // Disable VSync
    glfwSwapInterval(0);
 
    check_success(glGetError() == GL_NO_ERROR);
 
    return window;
}
 
 
// Application logic //
 
// Context structure for window callback functions.
struct Window_context
{
    bool mouse_event, key_event;
    bool save_image;
    int zoom;
 
    int mouse_button;            // button from callback event plus one (0 = no event)
    int mouse_button_action;     // action from mouse button callback event
    int mouse_wheel_delta;
    bool moving;
    double move_start_x, move_start_y;
    double move_dx, move_dy;
 
    int material_index_delta;
 
    bool save_result;
 
    bool exposure_event;
    float exposure;
};
 
static std::string to_string(Display_buffer_options option)
{
    switch (option)
    {
        case DISPLAY_BUFFER_LPE: return "Selected LPE";
        case DISPLAY_BUFFER_ALBEDO: return "Albedo";
        case DISPLAY_BUFFER_NORMAL: return "Normal";
        default: return "";
    }
}
 
static std::string to_string(mi::neuraylib::Df_flags option)
{
    switch (option)
    {
    case mi::neuraylib::DF_FLAGS_NONE: return "None";
    case mi::neuraylib::DF_FLAGS_ALLOW_REFLECT: return "Reflect only";
    case mi::neuraylib::DF_FLAGS_ALLOW_TRANSMIT: return "Transmit only";
    case mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT: return "Reflect+Transmit";
    default: return "";
    }
}
 
// GLFW scroll callback
static void handle_scroll(GLFWwindow *window, double xoffset, double yoffset)
{
    Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
    if (yoffset > 0.0) {
        ctx->mouse_wheel_delta = 1; ctx->mouse_event = true;
    } else if (yoffset < 0.0) {
        ctx->mouse_wheel_delta = -1; ctx->mouse_event = true;
    }
 
    ImGui_ImplGlfw_ScrollCallback(window, xoffset, yoffset);
}
 
// GLFW keyboard callback
static void handle_key(GLFWwindow *window, int key, int scancode, int action, int mods)
{
    // Handle key press events
    if (action == GLFW_PRESS) {
        Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
        switch (key) {
            // Escape closes the window
            case GLFW_KEY_ESCAPE:
                glfwSetWindowShouldClose(window, GLFW_TRUE);
                break;
            case GLFW_KEY_DOWN:
            case GLFW_KEY_RIGHT:
            case GLFW_KEY_PAGE_DOWN:
                ctx->material_index_delta = 1;
                ctx->key_event = true;
                break;
            case GLFW_KEY_UP:
            case GLFW_KEY_LEFT:
            case GLFW_KEY_PAGE_UP:
                ctx->material_index_delta = -1;
                ctx->key_event = true;
                break;
            case GLFW_KEY_ENTER:
                ctx->save_result = true;
                break;
            case GLFW_KEY_KP_SUBTRACT:
                ctx->exposure--;
                ctx->exposure_event = true;
                break;
            case GLFW_KEY_KP_ADD:
                ctx->exposure++;
                ctx->exposure_event = true;
                break;
            default:
                break;
        }
    }
 
    ImGui_ImplGlfw_KeyCallback(window, key, scancode, action, mods);
}
 
// GLFW mouse button callback
static void handle_mouse_button(GLFWwindow *window, int button, int action, int mods)
{
    Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
    ctx->mouse_button = button + 1;
    ctx->mouse_button_action = action;
 
    ImGui_ImplGlfw_MouseButtonCallback(window, button, action, mods);
}
 
// GLFW mouse position callback
static void handle_mouse_pos(GLFWwindow *window, double xpos, double ypos)
{
    Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
    if (ctx->moving)
    {
        ctx->move_dx += xpos - ctx->move_start_x;
        ctx->move_dy += ypos - ctx->move_start_y;
        ctx->move_start_x = xpos;
        ctx->move_start_y = ypos;
        ctx->mouse_event = true;
    }
}
 
// Resize CUDA buffers for a given resolution
static void resize_buffers(CUdeviceptr *buffer_cuda, int width, int height)
{
    // Allocate CUDA buffer
    if (*buffer_cuda)
        check_cuda_success(cuMemFree(*buffer_cuda));
 
    if (width == 0 || height == 0)
        *buffer_cuda = 0;
    else
        check_cuda_success(cuMemAlloc(buffer_cuda, width * height * sizeof(float3)));
}
 
// Helper for create_environment()
static float build_alias_map(
    const float *data,
    const unsigned int size,
    Env_accel *accel)
{
    // create qs (normalized)
    float sum = 0.0f;
    for (unsigned int i = 0; i < size; ++i)
        sum += data[i];
 
    for (unsigned int i = 0; i < size; ++i)
        accel[i].q = (static_cast<float>(size) * data[i] / sum);
 
    // create partition table
    unsigned int *partition_table = static_cast<unsigned int *>(
        malloc(size * sizeof(unsigned int)));
    unsigned int s = 0u, large = size;
    for (unsigned int i = 0; i < size; ++i)
        partition_table[(accel[i].q < 1.0f) ? (s++) : (--large)] = accel[i].alias = i;
 
    // create alias map
    for (s = 0; s < large && large < size; ++s)
    {
        const unsigned int j = partition_table[s], k = partition_table[large];
        accel[j].alias = k;
        accel[k].q += accel[j].q - 1.0f;
        large = (accel[k].q < 1.0f) ? (large + 1u) : large;
    }
 
    free(partition_table);
 
    return sum;
}
 
// Create environment map texture and acceleration data for importance sampling
static void create_environment(
    cudaTextureObject_t *env_tex,
    cudaArray_t *env_tex_data,
    CUdeviceptr *env_accel,
    uint2 *res,
    mi::base::Handle<mi::neuraylib::ITransaction> transaction,
    mi::base::Handle<mi::neuraylib::IImage_api> image_api,
    const char *envmap_name)
{
    // Load environment texture
    mi::base::Handle<mi::neuraylib::IImage>image(
        transaction->create<mi::neuraylib::IImage>("Image"));
    check_success(image->reset_file(envmap_name) == 0);
 
    mi::base::Handle<const mi::neuraylib::ICanvas> canvas(image->get_canvas(0, 0, 0));
    const mi::Uint32 rx = canvas->get_resolution_x();
    const mi::Uint32 ry = canvas->get_resolution_y();
    res->x = rx;
    res->y = ry;
 
    // Check, whether we need to convert the image
    char const *image_type = image->get_type(0, 0);
    if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0)
        canvas = image_api->convert(canvas.get(), "Color");
 
    // Copy the image data to a CUDA array
    const cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
    check_cuda_success(cudaMallocArray(env_tex_data, &channel_desc, rx, ry));
 
    mi::base::Handle<const mi::neuraylib::ITile> tile(canvas->get_tile());
    const float *pixels = static_cast<const float *>(tile->get_data());
 
    check_cuda_success(cudaMemcpy2DToArray(
        *env_tex_data, 0, 0, pixels,
        rx * sizeof(float4), rx * sizeof(float4), ry, cudaMemcpyHostToDevice));
 
    // Create a CUDA texture
    cudaResourceDesc res_desc;
    memset(&res_desc, 0, sizeof(res_desc));
    res_desc.resType = cudaResourceTypeArray;
    res_desc.res.array.array = *env_tex_data;
 
    cudaTextureDesc tex_desc;
    memset(&tex_desc, 0, sizeof(tex_desc));
    tex_desc.addressMode[0]   = cudaAddressModeWrap;
    tex_desc.addressMode[1]   = cudaAddressModeClamp;  // don't sample beyond poles of env sphere
    tex_desc.addressMode[2]   = cudaAddressModeWrap;
    tex_desc.filterMode       = cudaFilterModeLinear;
    tex_desc.readMode         = cudaReadModeElementType;
    tex_desc.normalizedCoords = 1;
 
    check_cuda_success(cudaCreateTextureObject(env_tex, &res_desc, &tex_desc, nullptr));
 
    // Create importance sampling data
    Env_accel *env_accel_host = static_cast<Env_accel *>(malloc(rx * ry * sizeof(Env_accel)));
    float *importance_data = static_cast<float *>(malloc(rx * ry * sizeof(float)));
    float cos_theta0 = 1.0f;
    const float step_phi = float(2.0 * M_PI) / float(rx);
    const float step_theta = float(M_PI) / float(ry);
    for (unsigned int y = 0; y < ry; ++y)
    {
        const float theta1 = float(y + 1) * step_theta;
        const float cos_theta1 = std::cos(theta1);
        const float area = (cos_theta0 - cos_theta1) * step_phi;
        cos_theta0 = cos_theta1;
 
        for (unsigned int x = 0; x < rx; ++x) {
            const unsigned int idx = y * rx + x;
            const unsigned int idx4 =  idx * 4;
            importance_data[idx] =
                area * std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2]));
        }
    }
    const float inv_env_integral = 1.0f / build_alias_map(importance_data, rx * ry, env_accel_host);
    free(importance_data);
    for (unsigned int i = 0; i < rx * ry; ++i) {
        const unsigned int idx4 = i * 4;
        env_accel_host[i].pdf =
            std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2])) * inv_env_integral;
    }
 
    *env_accel = gpu_mem_dup(env_accel_host, rx * ry * sizeof(Env_accel));
    free(env_accel_host);
}
 
 
static void upload_lpe_state_machine(
    Kernel_params& kernel_params,
    LPE_state_machine& lpe_state_machine)
{
    uint32_t num_trans = lpe_state_machine.get_transition_count();
    uint32_t num_states = lpe_state_machine.get_state_count();
    kernel_params.lpe_num_transitions = num_trans;
    kernel_params.lpe_num_states = num_states;
 
    // free old data
    if (kernel_params.lpe_state_table)
        check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
    if (kernel_params.lpe_final_mask)
        check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
 
    // state table
    CUdeviceptr state_table = 0;
    check_cuda_success(cuMemAlloc(&state_table, num_states * num_trans * sizeof(uint32_t)));
    check_cuda_success(cuMemcpyHtoD(state_table, lpe_state_machine.get_state_table().data(),
                       num_states * num_trans * sizeof(uint32_t)));
    kernel_params.lpe_state_table = reinterpret_cast<uint32_t*>(state_table);
 
    // final state masks
    CUdeviceptr final_mask = 0;
    check_cuda_success(cuMemAlloc(&final_mask, num_states * sizeof(uint32_t)));
    check_cuda_success(cuMemcpyHtoD(final_mask, lpe_state_machine.get_final_state_masks().data(),
                       num_states * sizeof(uint32_t)));
    kernel_params.lpe_final_mask = reinterpret_cast<uint32_t*>(final_mask);
 
    // tag ID for light sources as they don't store tags in this examples
    kernel_params.default_gtag = lpe_state_machine.handle_to_global_tag("");
    kernel_params.point_light_gtag = lpe_state_machine.handle_to_global_tag("point_light");
    kernel_params.env_gtag = lpe_state_machine.handle_to_global_tag("env");
}
 
// Save current result image to disk
static void save_result(
    const CUdeviceptr cuda_buffer,
    const unsigned int width,
    const unsigned int height,
    const std::string &filename,
    mi::base::Handle<mi::neuraylib::IFactory> factory,
    mi::base::Handle<mi::neuraylib::IImage_api> image_api,
    mi::base::Handle<mi::neuraylib::IMdl_impexp_api> mdl_impexp_api)
{
    mi::base::Handle<mi::neuraylib::ICanvas> canvas(
        image_api->create_canvas("Rgb_fp", width, height));
    mi::base::Handle<mi::neuraylib::ITile> tile(canvas->get_tile());
    float3 *data = static_cast<float3 *>(tile->get_data());
    check_cuda_success(cuMemcpyDtoH(data, cuda_buffer, width * height * sizeof(float3)));
 
    mi::base::Handle<mi::IBoolean> option_force_default_gamma(factory->create<mi::IBoolean>());
    option_force_default_gamma->set_value(true);
    mi::base::Handle<mi::IMap> export_options(factory->create<mi::IMap>("Map<Interface>"));
    export_options->insert("force_default_gamma", option_force_default_gamma.get());
    mdl_impexp_api->export_canvas(filename.c_str(), canvas.get(), export_options.get());
}
 
// Application options
struct Options {
    int cuda_device;
    float gui_scale;
    bool opengl;
    bool use_class_compilation;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
    bool use_df_interpreter;
#endif
    bool no_aa;
    bool enable_derivatives;
    bool fold_ternary_on_df;
    bool enable_auxiliary_output;
    bool enable_pdf;
    bool use_adapt_normal;
    bool enable_mdl_next;
    unsigned int res_x, res_y;
    unsigned int iterations;
    unsigned int samples_per_iteration;
    unsigned int mdl_test_type;
    unsigned int max_path_length;
    float fov;
    float exposure;
    float3 cam_pos;
    float3 light_pos;
    float3 light_intensity;
    bool enable_bsdf_flags;
    mi::neuraylib::Df_flags allowed_scatter_mode;
 
    std::string hdrfile;
    float hdr_rot;
    std::string outputfile;
    std::vector<std::string> material_names;
 
    // Default constructor, sets default values.
    Options()
    : cuda_device(0)
    , gui_scale(1.0f)
    , opengl(true)
    , use_class_compilation(true)
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
    , use_df_interpreter(false)
#endif
    , no_aa(false)
    , enable_derivatives(false)
    , fold_ternary_on_df(false)
    , enable_auxiliary_output(true)
    , enable_pdf(true)
    , use_adapt_normal(false)
    , enable_mdl_next(false)
    , res_x(1024)
    , res_y(1024)
    , iterations(4096)
    , samples_per_iteration(8)
    , mdl_test_type(MDL_TEST_MIS)
    , max_path_length(4)
    , fov(96.0f)
    , exposure(0.0f)
    , cam_pos(make_float3(0, 0, 3))
    , light_pos(make_float3(10, 0, 5))
    , light_intensity(make_float3(0, 0, 0))
    , enable_bsdf_flags(false)
    , allowed_scatter_mode(mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT)
    , hdrfile("nvidia/sdk_examples/resources/environment.hdr")
    , hdr_rot(0.0f)
    , outputfile("output.exr")
    , material_names()
    {}
};
 
// Possible enum values if any.
struct Enum_value {
    std::string name;
    int         value;
 
    Enum_value(const std::string &name, int value)
    : name(name), value(value)
    {
    }
};
 
// Info for an enum type.
struct Enum_type_info {
    std::vector<Enum_value> values;
 
    // Adds a enum value and its integer value to the enum type info.
    void add(const std::string &name, int value) {
        values.push_back(Enum_value(name, value));
    }
};
 
// Material parameter information structure.
class Param_info
{
public:
    enum Param_kind
    {
        PK_UNKNOWN,
        PK_FLOAT,
        PK_FLOAT2,
        PK_FLOAT3,
        PK_COLOR,
        PK_ARRAY,
        PK_BOOL,
        PK_INT,
        PK_ENUM,
        PK_STRING,
        PK_TEXTURE,
        PK_LIGHT_PROFILE,
        PK_BSDF_MEASUREMENT
    };
 
    Param_info(
        mi::Size index,
        char const *name,
        char const *display_name,
        char const *group_name,
        Param_kind kind,
        Param_kind array_elem_kind,
        mi::Size   array_size,
        mi::Size   array_pitch,
        char *data_ptr,
        const Enum_type_info *enum_info = nullptr)
    : m_index(index)
    , m_name(name)
    , m_display_name(display_name)
    , m_group_name(group_name)
    , m_kind(kind)
    , m_array_elem_kind(array_elem_kind)
    , m_array_size(array_size)
    , m_array_pitch(array_pitch)
    , m_data_ptr(data_ptr)
    , m_range_min(-100), m_range_max(100)
    , m_enum_info(enum_info)
    {
    }
 
    // Get data as T&.
    template<typename T>
    T &data() { return *reinterpret_cast<T *>(m_data_ptr); }
 
    // Get data as const T&.
    template<typename T>
    const T &data() const { return *reinterpret_cast<const T *>(m_data_ptr); }
 
    const char * &display_name()     { return m_display_name; }
    const char *display_name() const { return m_display_name; }
 
    const char * &group_name()     { return m_group_name; }
    const char *group_name() const { return m_group_name; }
 
    Param_kind kind() const { return m_kind; }
 
    Param_kind array_elem_kind() const { return m_array_elem_kind; }
    mi::Size array_size() const        { return m_array_size; }
    mi::Size array_pitch() const       { return m_array_pitch; }
 
    float &range_min()      { return m_range_min; }
    float range_min() const { return m_range_min; }
    float &range_max()      { return m_range_max; }
    float range_max() const { return m_range_max; }
 
    const Enum_type_info *enum_info() const { return m_enum_info; }
 
private:
    mi::Size             m_index;
    char const           *m_name;
    char const           *m_display_name;
    char const           *m_group_name;
    Param_kind           m_kind;
    Param_kind           m_array_elem_kind;
    mi::Size             m_array_size;
    mi::Size             m_array_pitch;   // the distance between two array elements
    char                 *m_data_ptr;
    float                m_range_min, m_range_max;
    const Enum_type_info *m_enum_info;
};
 
// Material information structure.
class Material_info
{
public:
    Material_info(char const *name)
    : m_name(name)
    {}
 
    // Add the parameter information as last entry of the corresponding group, or to the
    // end of the list, if no group name is available.
    void add_sorted_by_group(const Param_info &info) {
        bool group_found = false;
        if (info.group_name() != nullptr) {
            for (std::list<Param_info>::iterator it = params().begin(); it != params().end(); ++it)
            {
                const bool same_group =
                    it->group_name() != nullptr && strcmp(it->group_name(), info.group_name()) == 0;
                if (group_found && !same_group) {
                    m_params.insert(it, info);
                    return;
                }
                if (same_group)
                    group_found = true;
            }
        }
        m_params.push_back(info);
    }
 
    // Add a new enum type to the list of used enum types.
    void add_enum_type(const std::string name, std::shared_ptr<Enum_type_info> enum_info) {
        enum_types[name] = enum_info;
    }
 
    // Lookup enum type info for a given enum type absolute MDL name.
    const Enum_type_info *get_enum_type(const std::string name) {
        Enum_type_map::const_iterator it = enum_types.find(name);
        if (it != enum_types.end())
            return it->second.get();
        return nullptr;
    }
 
    // Get the name of the material.
    char const *name() const { return m_name; }
 
    // Get the parameters of this material.
    std::list<Param_info> &params() { return m_params; }
 
private:
    // name of the material
    char const *m_name;
 
    // parameters of the material
    std::list<Param_info> m_params;
 
    typedef std::map<std::string, std::shared_ptr<Enum_type_info> > Enum_type_map;
 
    // used enum types of the material
    Enum_type_map enum_types;
};
 
// Helper class to handle Resource tables of the target code.
class Resource_table
{
    typedef std::map<std::string, unsigned> Resource_id_map;
public:
    enum Kind {
        RESOURCE_TEXTURE,
        RESOURCE_LIGHT_PROFILE,
        RESOURCE_BSDF_MEASUREMENT
    };
 
    // Constructor.
    Resource_table(
        mi::base::Handle<mi::neuraylib::ITarget_code const> target_code,
        mi::base::Handle<mi::neuraylib::ITransaction>       transaction,
        Kind                                                kind)
    : m_max_len(0u)
    {
        read_resources(target_code, transaction, kind);
    }
 
    // Get the length of the longest URL in the resource table.
    size_t get_max_length() const { return m_max_len; }
 
    // Get all urls.
    std::vector<std::string> const &get_urls() const { return m_urls; }
 
private:
    void read_resources(
        mi::base::Handle<mi::neuraylib::ITarget_code const> target_code,
        mi::base::Handle<mi::neuraylib::ITransaction>       transaction,
        Kind                                                kind)
    {
        m_urls.push_back("<unset>");
        switch (kind) {
        case RESOURCE_TEXTURE:
            for (mi::Size i = 1, n = target_code->get_texture_count(); i < n; ++i) {
                const char *s = target_code->get_texture(i);
                mi::base::Handle<mi::neuraylib::ITexture const> tex(
                    transaction->access<mi::neuraylib::ITexture>(s));
                char const *url = nullptr;
                if (char const *img = tex->get_image()) {
                    mi::base::Handle<mi::neuraylib::IImage const> image(
                        transaction->access<mi::neuraylib::IImage>(img));
                    url = image->get_filename(0, 0);
                }
                if (url == nullptr)
                    url = s;
                size_t l = strlen(url);
                if (l > m_max_len)
                    m_max_len = l;
                m_resource_map[s] = (unsigned)i;
                m_urls.push_back(url);
            }
            break;
        case RESOURCE_LIGHT_PROFILE:
            for (mi::Size i = 1, n = target_code->get_light_profile_count(); i < n; ++i) {
                const char *s = target_code->get_light_profile(i);
                mi::base::Handle<mi::neuraylib::ILightprofile const> lp(
                    transaction->access<mi::neuraylib::ILightprofile>(s));
                char const *url = lp->get_filename();
                if (url == nullptr)
                    url = s;
                size_t l = strlen(url);
                if (l > m_max_len)
                    m_max_len = l;
                m_resource_map[s] = (unsigned)i;
                m_urls.push_back(url);
            }
            break;
        case RESOURCE_BSDF_MEASUREMENT:
            for (mi::Size i = 1, n = target_code->get_bsdf_measurement_count(); i < n; ++i) {
                const char *s = target_code->get_bsdf_measurement(i);
                mi::base::Handle<mi::neuraylib::IBsdf_measurement const> bm(
                    transaction->access<mi::neuraylib::IBsdf_measurement>(s));
                char const *url = bm->get_filename();
                if (url == nullptr)
                    url = s;
                size_t l = strlen(url);
                if (l > m_max_len)
                    m_max_len = l;
                m_resource_map[s] = (unsigned)i;
                m_urls.push_back(url);
            }
            break;
        }
    }
 
private:
    Resource_id_map          m_resource_map;
    std::vector<std::string> m_urls;
    size_t                   m_max_len;
};
 
// Helper class to handle the string table of a target code.
class String_constant_table
{
    typedef std::map<std::string, unsigned> String_map;
public:
    // Constructor.
    String_constant_table(mi::base::Handle<mi::neuraylib::ITarget_code const> target_code)
    {
        get_all_strings(target_code);
    }
 
    // Get the ID for a given string, return 0 if the string does not exist in the table.
    unsigned get_id_for_string(const char *name) {
        String_map::const_iterator it(m_string_constants_map.find(name));
        if (it != m_string_constants_map.end())
            return it->second;
 
        // the user adds a string that is NOT in the code and we have not seen so far, add it
        // and assign a new id
        unsigned n_id = unsigned(m_string_constants_map.size() + 1);
 
        m_string_constants_map[name] = n_id;
        m_strings.reserve((n_id + 63) & ~63);
        m_strings.push_back(name);
 
        size_t l = strlen(name);
        if (l > m_max_len)
            m_max_len = l;
        return n_id;
    }
 
    // Get the length of the longest string in the string constant table.
    size_t get_max_length() const { return m_max_len; }
 
    // Get the string for a given ID, or nullptr if this ID does not exist.
    const char *get_string(unsigned id) {
        if (id == 0 || id - 1 >= m_strings.size())
            return nullptr;
        return m_strings[id - 1].c_str();
    }
 
private:
    // Get all string constants used inside a target code and their maximum length.
    void get_all_strings(
        mi::base::Handle<mi::neuraylib::ITarget_code const> target_code)
    {
        m_max_len = 0;
        // ignore the 0, it is the "Not-a-known-string" entry
        m_strings.reserve(target_code->get_string_constant_count());
        for (mi::Size i = 1, n = target_code->get_string_constant_count(); i < n; ++i) {
            const char *s = target_code->get_string_constant(i);
            size_t l = strlen(s);
            if (l > m_max_len)
                m_max_len = l;
            m_string_constants_map[s] = (unsigned)i;
            m_strings.push_back(s);
        }
    }
 
private:
    String_map               m_string_constants_map;
    std::vector<std::string> m_strings;
    size_t                   m_max_len;
};
 
// Update the camera kernel parameters.
static void update_camera(
    Kernel_params &kernel_params,
    double phi,
    double theta,
    float base_dist,
    int zoom)
{
    kernel_params.cam_dir.x = float(-sin(phi) * sin(theta));
    kernel_params.cam_dir.y = float(-cos(theta));
    kernel_params.cam_dir.z = float(-cos(phi) * sin(theta));
 
    kernel_params.cam_right.x = float(cos(phi));
    kernel_params.cam_right.y = 0.0f;
    kernel_params.cam_right.z = float(-sin(phi));
 
    kernel_params.cam_up.x = float(-sin(phi) * cos(theta));
    kernel_params.cam_up.y = float(sin(theta));
    kernel_params.cam_up.z = float(-cos(phi) * cos(theta));
 
    const float dist = float(base_dist * pow(0.95, double(zoom)));
    kernel_params.cam_pos.x = -kernel_params.cam_dir.x * dist;
    kernel_params.cam_pos.y = -kernel_params.cam_dir.y * dist;
    kernel_params.cam_pos.z = -kernel_params.cam_dir.z * dist;
}
 
// Add a combobox for the given resource parameter to the GUI
static bool handle_resource(
    Param_info           &param,
    Resource_table const &res_table)
{
    bool changed = false;
    std::vector<std::string> const &urls = res_table.get_urls();
    int id = param.data<int>();
    std::string cur_url = urls[id];
 
    if (ImGui::BeginCombo(param.display_name(), cur_url.c_str())) {
        for (size_t i = 0, n = urls.size(); i < n; ++i) {
            const std::string &name = urls[i];
            bool is_selected = (cur_url == name);
            if (ImGui::Selectable(name.c_str(), is_selected)) {
                param.data<int>() = int(i);
                changed = true;
            }
            if (is_selected)
                ImGui::SetItemDefaultFocus();
        }
        ImGui::EndCombo();
    }
    return changed;
}
 
static void launch_subframe(
    mi::examples::mdl::GL_display* gl_display,
    int width,
    int height,
    CUfunction cuda_function,
    Kernel_params *kernel_params)
{
    // Map diplay buffer, if present
    if (gl_display) {
        kernel_params->display_buffer =
            reinterpret_cast<unsigned int*>(gl_display->map(0));
    }
 
    // Launch kernel
    dim3 threads_per_block(16, 16);
    dim3 num_blocks((width + 15) / 16, (height + 15) / 16);
    void* params[] = { kernel_params };
    check_cuda_success(cuLaunchKernel(
        cuda_function,
        num_blocks.x, num_blocks.y, num_blocks.z,
        threads_per_block.x, threads_per_block.y, threads_per_block.z,
        0, nullptr, params, nullptr));
 
    kernel_params->iteration_start += kernel_params->iteration_num;
 
    // Unmap again, if necessary
    if (gl_display) {
        gl_display->unmap(0);
    }
 
    // Make sure, any debug::print()s are written to the console
    check_cuda_success(cuStreamSynchronize(0));
 
}
 
// Progressively render scene
static void render_scene(
    const Options &options,
    mi::base::Handle<mi::neuraylib::ITransaction>         transaction,
    mi::base::Handle<mi::neuraylib::IFactory>             factory,
    mi::base::Handle<mi::neuraylib::IImage_api>           image_api,
    mi::base::Handle<mi::neuraylib::IMdl_impexp_api>      mdl_impexp_api,
    mi::base::Handle<mi::neuraylib::ITarget_code const>   target_code,
    Material_compiler::Material_definition_list const    &material_defs,
    Material_compiler::Compiled_material_list const      &compiled_materials,
    std::vector<size_t> const                            &arg_block_indices,
    std::vector<Df_cuda_material> const                  &material_bundle,
    LPE_state_machine                                    &lpe_state_machine)
{
    Window_context window_context;
    memset(&window_context, 0, sizeof(Window_context));
 
    mi::examples::mdl::GL_display *gl_display = nullptr;
    GLFWwindow *window = nullptr;
    int width = -1;
    int height = -1;
 
    if (options.opengl) {
        // Init OpenGL window
        std::string version_string;
        window = init_opengl(version_string, int(options.res_x), int(options.res_y));
        glfwSetWindowUserPointer(window, &window_context);
        glfwSetKeyCallback(window, handle_key);
        glfwSetScrollCallback(window, handle_scroll);
        glfwSetCursorPosCallback(window, handle_mouse_pos);
        glfwSetMouseButtonCallback(window, handle_mouse_button);
        glfwSetCharCallback(window, ImGui_ImplGlfw_CharCallback);
 
        IMGUI_CHECKVERSION();
        ImGui::CreateContext();
 
        ImGui_ImplGlfw_InitForOpenGL(window, false);
        ImGui_ImplOpenGL3_Init(version_string.c_str());
        ImGui::GetIO().IniFilename = nullptr;       // disable creating imgui.ini
        ImGui::StyleColorsDark();
        ImGui::GetStyle().Alpha = 0.7f;
        ImGui::GetStyle().ScaleAllSizes(options.gui_scale);
 
        gl_display = new mi::examples::mdl::GL_display(int(options.res_x), int(options.res_y));
    }
 
    // Initialize CUDA
    CUcontext cuda_context = init_cuda(options.cuda_device, options.opengl);
 
    CUdeviceptr accum_buffer = 0;
    CUdeviceptr aux_albedo_buffer = 0; // buffer for auxiliary output
    CUdeviceptr aux_normal_buffer = 0; //
 
    if (!options.opengl) {
        width = options.res_x;
        height = options.res_y;
        check_cuda_success(cuMemAlloc(&accum_buffer, width * height * sizeof(float3)));
        check_cuda_success(cuMemAlloc(&aux_albedo_buffer, width * height * sizeof(float3)));
        check_cuda_success(cuMemAlloc(&aux_normal_buffer, width * height * sizeof(float3)));
    }
 
    // Setup initial CUDA kernel parameters
    Kernel_params kernel_params;
    memset(&kernel_params, 0, sizeof(Kernel_params));
    kernel_params.cam_focal = 1.0f / tanf(options.fov / 2 * float(2 * M_PI / 360));
    kernel_params.light_pos = options.light_pos;
    kernel_params.light_intensity = fmaxf(
        options.light_intensity.x, fmaxf(options.light_intensity.y, options.light_intensity.z));
    kernel_params.light_color = kernel_params.light_intensity > 0.0f
        ? options.light_intensity / kernel_params.light_intensity
        : make_float3(1.0f, 0.9f, 0.5f);
    kernel_params.env_intensity = 1.0f;
    kernel_params.iteration_start = 0;
    kernel_params.iteration_num = options.samples_per_iteration;
    kernel_params.mdl_test_type = options.mdl_test_type;
    kernel_params.max_path_length = options.max_path_length;
    kernel_params.exposure_scale = powf(2.0f, options.exposure);
    kernel_params.disable_aa = options.no_aa;
    kernel_params.bsdf_data_flags = options.allowed_scatter_mode;
    kernel_params.use_derivatives = options.enable_derivatives;
    kernel_params.enable_auxiliary_output = options.enable_auxiliary_output;
    kernel_params.display_buffer_index = 0;
 
    kernel_params.lpe_ouput_expression = 0;
    kernel_params.lpe_state_table = nullptr;
    kernel_params.lpe_final_mask = nullptr;
 
    kernel_params.current_material = 0;
    kernel_params.geometry = material_bundle[kernel_params.current_material].contains_hair_bsdf ?
        GT_HAIR : GT_SPHERE;
 
    // Setup camera
    float base_dist = length(options.cam_pos);
    double theta, phi;
    {
        const float3 inv_dir = normalize(options.cam_pos);
        phi = atan2(inv_dir.x, inv_dir.z);
        theta = acos(inv_dir.y);
    }
 
    update_camera(kernel_params, phi, theta, base_dist, window_context.zoom);
 
    // Build the full CUDA kernel with all the generated code
    std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > target_codes;
    target_codes.push_back(target_code);
    CUfunction  cuda_function;
    char const *ptx_name = options.enable_derivatives ?
        "example_df_cuda_derivatives.ptx" : "example_df_cuda.ptx";
    CUmodule    cuda_module = build_linked_kernel(
        target_codes,
        (mi::examples::io::get_executable_folder() + "/" + ptx_name).c_str(),
        "render_scene_kernel",
        &cuda_function);
 
    // copy materials of the scene to the device
    CUdeviceptr material_buffer = 0;
    check_cuda_success(cuMemAlloc(&material_buffer,
                                  material_bundle.size() * sizeof(Df_cuda_material)));
 
    check_cuda_success(cuMemcpyHtoD(material_buffer, material_bundle.data(),
                                    material_bundle.size() * sizeof(Df_cuda_material)));
    kernel_params.material_buffer = reinterpret_cast<Df_cuda_material*>(material_buffer);
 
    // Setup environment map and acceleration
    CUdeviceptr env_accel;
    cudaArray_t env_tex_data;
    create_environment(
        &kernel_params.env_tex, &env_tex_data, &env_accel, &kernel_params.env_size, transaction,
        image_api, options.hdrfile.c_str());
    kernel_params.env_accel = reinterpret_cast<Env_accel *>(env_accel);
    kernel_params.env_rotation = options.hdr_rot / 180.0f * float(M_PI);
 
    // Setup GPU runtime of the LPE state machine
    upload_lpe_state_machine(kernel_params, lpe_state_machine);
 
    // Setup file name for nogl mode
    std::string next_filename_base;
    std::string filename_base, filename_ext;
    size_t dot_pos = options.outputfile.rfind('.');
    if (dot_pos == std::string::npos) {
        filename_base = options.outputfile;
    } else {
        filename_base = options.outputfile.substr(0, dot_pos);
        filename_ext = options.outputfile.substr(dot_pos);
    }
    if (options.material_names.size() > 1)
        next_filename_base = filename_base + "-0";
    else
        next_filename_base = filename_base;
 
    // Scope for material context resources
    {
        // Prepare the needed data of all target codes for the GPU
        Material_gpu_context material_gpu_context(options.enable_derivatives);
        if (!material_gpu_context.prepare_target_code_data(
                transaction.get(), image_api.get(), target_code.get(), arg_block_indices))
            terminate();
        kernel_params.tc_data = reinterpret_cast<Target_code_data *>(
            material_gpu_context.get_device_target_code_data_list());
        kernel_params.arg_block_list = reinterpret_cast<char const **>(
            material_gpu_context.get_device_target_argument_block_list());
 
        String_constant_table constant_table(target_code);
        Resource_table texture_table(target_code, transaction, Resource_table::RESOURCE_TEXTURE);
        Resource_table lp_table(target_code, transaction, Resource_table::RESOURCE_LIGHT_PROFILE);
        Resource_table bm_table(
            target_code, transaction, Resource_table::RESOURCE_BSDF_MEASUREMENT);
 
        // Collect information about the arguments of the compiled materials
        std::vector<Material_info> mat_infos;
        for (size_t i = 0, num_mats = compiled_materials.size(); i < num_mats; ++i) {
            // Get the compiled material and the parameter annotations
            mi::neuraylib::ICompiled_material const *cur_mat = compiled_materials[i].get();
            mi::neuraylib::IFunction_definition const *cur_def = material_defs[i].get();
            mi::base::Handle<mi::neuraylib::IAnnotation_list const> anno_list(
                cur_def->get_parameter_annotations());
 
            // Get the target argument block and its layout
            size_t arg_block_index = material_gpu_context.get_bsdf_argument_block_index(i);
            mi::base::Handle<mi::neuraylib::ITarget_value_layout const> layout(
                material_gpu_context.get_argument_block_layout(arg_block_index));
            mi::base::Handle<mi::neuraylib::ITarget_argument_block> arg_block(
                material_gpu_context.get_argument_block(arg_block_index));
            char *arg_block_data = arg_block != nullptr ? arg_block->get_data() : nullptr;
 
            Material_info mat_info(cur_def->get_mdl_name());
            for (mi::Size j = 0, num_params = cur_mat->get_parameter_count(); j < num_params; ++j) {
                const char *name = cur_mat->get_parameter_name(j);
                if (name == nullptr) continue;
 
                // Determine the type of the argument
                mi::base::Handle<mi::neuraylib::IValue const> arg(cur_mat->get_argument(j));
                mi::neuraylib::IValue::Kind kind = arg->get_kind();
 
                Param_info::Param_kind param_kind            = Param_info::PK_UNKNOWN;
                Param_info::Param_kind param_array_elem_kind = Param_info::PK_UNKNOWN;
                mi::Size               param_array_size      = 0;
                mi::Size               param_array_pitch     = 0;
                const Enum_type_info   *enum_type            = nullptr;
 
                switch (kind) {
                case mi::neuraylib::IValue::VK_FLOAT:
                    param_kind = Param_info::PK_FLOAT;
                    break;
                case mi::neuraylib::IValue::VK_COLOR:
                    param_kind = Param_info::PK_COLOR;
                    break;
                case mi::neuraylib::IValue::VK_BOOL:
                    param_kind = Param_info::PK_BOOL;
                    break;
                case mi::neuraylib::IValue::VK_INT:
                    param_kind = Param_info::PK_INT;
                    break;
                case mi::neuraylib::IValue::VK_VECTOR:
                    {
                        mi::base::Handle<mi::neuraylib::IValue_vector const> val(
                            arg.get_interface<mi::neuraylib::IValue_vector const>());
                        mi::base::Handle<mi::neuraylib::IType_vector const> val_type(
                            val->get_type());
                        mi::base::Handle<mi::neuraylib::IType_atomic const> elem_type(
                            val_type->get_element_type());
                        if (elem_type->get_kind() == mi::neuraylib::IType::TK_FLOAT) {
                            switch (val_type->get_size()) {
                            case 2: param_kind = Param_info::PK_FLOAT2; break;
                            case 3: param_kind = Param_info::PK_FLOAT3; break;
                            default: assert(false && "Vector Size invalid or unhandled.");
                            }
                        }
                    }
                    break;
                case mi::neuraylib::IValue::VK_ARRAY:
                    {
                        mi::base::Handle<mi::neuraylib::IValue_array const> val(
                            arg.get_interface<mi::neuraylib::IValue_array const>());
                        mi::base::Handle<mi::neuraylib::IType_array const> val_type(
                            val->get_type());
                        mi::base::Handle<mi::neuraylib::IType const> elem_type(
                            val_type->get_element_type());
 
                        // we currently only support arrays of some values
                        switch (elem_type->get_kind()) {
                        case mi::neuraylib::IType::TK_FLOAT:
                            param_array_elem_kind = Param_info::PK_FLOAT;
                            break;
                        case mi::neuraylib::IType::TK_COLOR:
                            param_array_elem_kind = Param_info::PK_COLOR;
                            break;
                        case mi::neuraylib::IType::TK_BOOL:
                            param_array_elem_kind = Param_info::PK_BOOL;
                            break;
                        case mi::neuraylib::IType::TK_INT:
                            param_array_elem_kind = Param_info::PK_INT;
                            break;
                        case mi::neuraylib::IType::TK_VECTOR:
                            {
                                mi::base::Handle<mi::neuraylib::IType_vector const> val_type(
                                    elem_type.get_interface<
                                        mi::neuraylib::IType_vector const>());
                                mi::base::Handle<mi::neuraylib::IType_atomic const> velem_type(
                                    val_type->get_element_type());
                                if (velem_type->get_kind() == mi::neuraylib::IType::TK_FLOAT) {
                                    switch (val_type->get_size()) {
                                    case 2:
                                        param_array_elem_kind = Param_info::PK_FLOAT2;
                                        break;
                                    case 3:
                                        param_array_elem_kind = Param_info::PK_FLOAT3;
                                        break;
                                    default:
                                        assert(false && "Vector Size invalid or unhandled.");
                                    }
                                }
                            }
                            break;
                        default:
                            assert(false && "Array element type invalid or unhandled.");
                        }
                        if (param_array_elem_kind != Param_info::PK_UNKNOWN) {
                            param_kind = Param_info::PK_ARRAY;
                            param_array_size = val_type->get_size();
 
                            // determine pitch of array if there are at least two elements
                            if (param_array_size > 1) {
                                mi::neuraylib::Target_value_layout_state array_state(
                                    layout->get_nested_state(j));
                                mi::neuraylib::Target_value_layout_state next_elem_state(
                                    layout->get_nested_state(1, array_state));
 
                                mi::neuraylib::IValue::Kind kind;
                                mi::Size param_size;
                                mi::Size start_offset = layout->get_layout(
                                    kind, param_size, array_state);
                                mi::Size next_offset = layout->get_layout(
                                    kind, param_size, next_elem_state);
                                param_array_pitch = next_offset - start_offset;
                            }
                        }
                    }
                    break;
                case mi::neuraylib::IValue::VK_ENUM:
                    {
                        mi::base::Handle<mi::neuraylib::IValue_enum const> val(
                            arg.get_interface<mi::neuraylib::IValue_enum const>());
                        mi::base::Handle<mi::neuraylib::IType_enum const> val_type(
                            val->get_type());
 
                        // prepare info for this enum type if not seen so far
                        const Enum_type_info *info = mat_info.get_enum_type(val_type->get_symbol());
                        if (info == nullptr) {
                            std::shared_ptr<Enum_type_info> p(new Enum_type_info());
 
                            for (mi::Size i = 0, n = val_type->get_size(); i < n; ++i) {
                                p->add(val_type->get_value_name(i), val_type->get_value_code(i));
                            }
                            mat_info.add_enum_type(val_type->get_symbol(), p);
                            info = p.get();
                        }
                        enum_type = info;
 
                        param_kind = Param_info::PK_ENUM;
                    }
                    break;
                case mi::neuraylib::IValue::VK_STRING:
                    param_kind = Param_info::PK_STRING;
                    break;
                case mi::neuraylib::IValue::VK_TEXTURE:
                    param_kind = Param_info::PK_TEXTURE;
                    break;
                case mi::neuraylib::IValue::VK_LIGHT_PROFILE:
                    param_kind = Param_info::PK_LIGHT_PROFILE;
                    break;
                case mi::neuraylib::IValue::VK_BSDF_MEASUREMENT:
                    param_kind = Param_info::PK_BSDF_MEASUREMENT;
                    break;
                default:
                    // Unsupported? -> skip
                    continue;
                }
 
                // Get the offset of the argument within the target argument block
                mi::neuraylib::Target_value_layout_state state(layout->get_nested_state(j));
                mi::neuraylib::IValue::Kind kind2;
                mi::Size param_size;
                mi::Size offset = layout->get_layout(kind2, param_size, state);
                check_success(kind == kind2);
 
                Param_info param_info(
                    j,
                    name,
                    name,
                    /*group_name=*/ nullptr,
                    param_kind,
                    param_array_elem_kind,
                    param_array_size,
                    param_array_pitch,
                    arg_block_data + offset,
                    enum_type);
 
                // Check for annotation info
                mi::base::Handle<mi::neuraylib::IAnnotation_block const> anno_block(
                    anno_list->get_annotation_block(name));
                if (anno_block) {
                    mi::neuraylib::Annotation_wrapper annos(anno_block.get());
                    mi::Size anno_index =
                        annos.get_annotation_index("::anno::soft_range(float,float)");
                    if (anno_index == mi::Size(-1)) {
                        anno_index = annos.get_annotation_index("::anno::hard_range(float,float)");
                    }
                    if (anno_index != mi::Size(-1)) {
                        annos.get_annotation_param_value(anno_index, 0, param_info.range_min());
                        annos.get_annotation_param_value(anno_index, 1, param_info.range_max());
                    }
                    anno_index = annos.get_annotation_index("::anno::display_name(string)");
                    if (anno_index != mi::Size(-1)) {
                        annos.get_annotation_param_value(anno_index, 0, param_info.display_name());
                    }
                    anno_index = annos.get_annotation_index("::anno::in_group(string)");
                    if (anno_index != mi::Size(-1)) {
                        annos.get_annotation_param_value(anno_index, 0, param_info.group_name());
                    }
                }
 
                mat_info.add_sorted_by_group(param_info);
            }
            mat_infos.push_back(mat_info);
        }
 
        if (!options.opengl)
        {
            kernel_params.resolution.x = width;
            kernel_params.resolution.y = height;
            kernel_params.accum_buffer = reinterpret_cast<float3*>(accum_buffer);
            kernel_params.albedo_buffer = reinterpret_cast<float3*>(aux_albedo_buffer);
            kernel_params.normal_buffer = reinterpret_cast<float3*>(aux_normal_buffer);
 
            // render images for all materials
            while (kernel_params.current_material < material_bundle.size()) {
                // render scene
                {
                    Timing timing("rendering");
                    while (kernel_params.iteration_start < options.iterations) {
                        launch_subframe(gl_display, width, height, cuda_function, &kernel_params);
                    }
                }
 
                save_result(
                    accum_buffer, width, height,
                    next_filename_base + filename_ext,
                    factory, image_api, mdl_impexp_api);
 
                save_result(
                    aux_albedo_buffer, width, height,
                    next_filename_base + "_albedo" + filename_ext,
                    factory, image_api, mdl_impexp_api);
 
                save_result(
                    aux_normal_buffer, width, height,
                    next_filename_base + "_normal" + filename_ext,
                    factory, image_api, mdl_impexp_api);
 
                // All materials have been rendered? -> done
                if (kernel_params.current_material + 1 >= material_bundle.size())
                    break;
 
                if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
                    kernel_params.geometry = GT_SPHERE;
                else
                    kernel_params.geometry = GT_HAIR;
 
                // Start new image with next material
                kernel_params.iteration_start = 0;
                ++kernel_params.current_material;
                next_filename_base =
                    filename_base + "-" + to_string(kernel_params.current_material);
            }
        } else {
            std::chrono::duration<double> state_update_time( 0.0 );
            std::chrono::duration<double> render_time( 0.0 );
            std::chrono::duration<double> display_time( 0.0 );
            char stats_text[128];
            int last_update_frames = -1;
            auto last_update_time = std::chrono::steady_clock::now();
            const std::chrono::duration<double> update_min_interval( 0.5 );
 
            // Main render loop
            while (true)
            {
                std::chrono::time_point<std::chrono::steady_clock> t0 =
                    std::chrono::steady_clock::now();
 
                // Check for termination
                if (glfwWindowShouldClose(window))
                    break;
 
                // Poll for events and process them
                glfwPollEvents();
 
                // Check if buffers need to be resized
                int nwidth, nheight;
                glfwGetFramebufferSize(window, &nwidth, &nheight);
                if (nwidth != width || nheight != height)
                {
                    width = nwidth;
                    height = nheight;
 
                    gl_display->resize(width, height);
 
                    resize_buffers(
                        &accum_buffer, width, height);
                    kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
 
                    resize_buffers(&aux_albedo_buffer, width, height);
                    kernel_params.albedo_buffer = reinterpret_cast<float3 *>(aux_albedo_buffer);
 
 
                    resize_buffers(&aux_normal_buffer, width, height);
                    kernel_params.normal_buffer = reinterpret_cast<float3 *>(aux_normal_buffer);
 
                    kernel_params.resolution.x = width;
                    kernel_params.resolution.y = height;
                    kernel_params.iteration_start = 0;
                }
 
                // Don't render anything, if minimized
                if (width == 0 || height == 0) {
                    // Wait until something happens
                    glfwWaitEvents();
                    continue;
                }
 
                ImGui_ImplOpenGL3_NewFrame();
                ImGui_ImplGlfw_NewFrame();
                ImGui::NewFrame();
 
                // Create material parameter editor window
 
                ImGui::SetNextWindowPos(ImVec2(10, 100), ImGuiCond_FirstUseEver);
                ImGui::SetNextWindowSize(
                    ImVec2(360 * options.gui_scale, 600 * options.gui_scale),
                    ImGuiCond_FirstUseEver);
                ImGui::Begin("Settings");
                ImGui::SetWindowFontScale(options.gui_scale);
                ImGui::PushItemWidth(-200 * options.gui_scale);
                if (options.use_class_compilation)
                    ImGui::Text("CTRL + Click to manually enter numbers");
                else
                    ImGui::Text("Parameter editing requires class compilation.");
 
                if (kernel_params.enable_auxiliary_output)
                {
                    ImGui::Dummy(ImVec2(0.0f, 3.0f));
                    ImGui::Text("Display options");
                    ImGui::Separator();
 
                    std::string current_lpe_name = lpe_state_machine.get_expression_name(
                        kernel_params.lpe_ouput_expression);
                    if (ImGui::BeginCombo("LPE", current_lpe_name.c_str()))
                    {
                        for (uint32_t i = 0; i < lpe_state_machine.get_expression_count(); ++i)
                        {
                            const char* name = lpe_state_machine.get_expression_name(i);
                            bool is_selected = (i == kernel_params.lpe_ouput_expression);
                            if (ImGui::Selectable(name, is_selected))
                            {
                                kernel_params.lpe_ouput_expression = i;
                                kernel_params.iteration_start = 0;
                            }
                            if (is_selected)
                                ImGui::SetItemDefaultFocus();
                        }
                        ImGui::EndCombo();
                    }
 
                    std::string current_display_buffer =
                        to_string((Display_buffer_options) kernel_params.display_buffer_index);
                    if (ImGui::BeginCombo("buffer", current_display_buffer.c_str()))
                    {
                        for (unsigned i = 0; i < (unsigned) DISPLAY_BUFFER_COUNT; ++i)
                        {
                            const std::string &name = to_string((Display_buffer_options) i);
                            bool is_selected = (current_display_buffer == name);
                            if (ImGui::Selectable(name.c_str(), is_selected))
                            {
                                kernel_params.display_buffer_index = i;
                                kernel_params.iteration_start = 0;
                            }
                            if (is_selected)
                                ImGui::SetItemDefaultFocus();
                        }
                        ImGui::EndCombo();
                    }
                }
 
                if (options.enable_bsdf_flags)
                {
                    std::string current_allow_mode = to_string(kernel_params.bsdf_data_flags);
                    if (ImGui::BeginCombo("BSDF flags", current_allow_mode.c_str()))
                    {
                        for (unsigned i = 0;
                            i <= (unsigned) mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT; ++i)
                        {
                            const std::string &name = to_string((mi::neuraylib::Df_flags) i);
                            bool is_selected = (current_allow_mode == name);
                            if (ImGui::Selectable(name.c_str(), is_selected))
                            {
                                kernel_params.bsdf_data_flags = (mi::neuraylib::Df_flags) i;
                                kernel_params.iteration_start = 0;
                            }
                            if (is_selected)
                                ImGui::SetItemDefaultFocus();
                        }
                        ImGui::EndCombo();
                    }
                }
 
                ImGui::Dummy(ImVec2(0.0f, 3.0f));
                ImGui::Text("Light parameters");
                ImGui::Separator();
 
                if (ImGui::ColorEdit3("Point Light Color", &kernel_params.light_color.x))
                    kernel_params.iteration_start = 0;
 
                if (ImGui::SliderFloat("Point Light Intensity",
                    &kernel_params.light_intensity, 0.0f, 50000.0f))
                        kernel_params.iteration_start = 0;
 
                if (ImGui::SliderFloat("Environment Intensity Scale",
                    &kernel_params.env_intensity, 0.0f, 10.0f))
                        kernel_params.iteration_start = 0;
 
                float env_rot_degree = kernel_params.env_rotation / float(M_PI) * 180.0f;
                if (ImGui::SliderFloat("Environment Rotation",
                    &env_rot_degree, 0.0f, 360.0f))
                {
                    // wrap in case of negative input
                    // we don't want fmodf behavior for negative values
                    env_rot_degree -= floorf(env_rot_degree / 360.0f) * 360.f;
                    kernel_params.env_rotation = fmodf(env_rot_degree, 360.0f) / 180.0f * float(M_PI);
                    kernel_params.iteration_start = 0;
                }
 
                ImGui::Dummy(ImVec2(0.0f, 3.0f));
                ImGui::Text("Material parameters");
                ImGui::Separator();
 
                Material_info &mat_info = mat_infos[
                    material_bundle[kernel_params.current_material].compiled_material_index];
 
                // Print material name
                ImGui::Text("%s", mat_info.name());
 
                bool changed = false;
                const char *group_name = nullptr;
                int id = 0;
                for (std::list<Param_info>::iterator it = mat_info.params().begin(),
                    end = mat_info.params().end(); it != end; ++it, ++id)
                {
                    Param_info &param = *it;
 
                    // Ensure unique ID even for parameters with same display names
                    ImGui::PushID(id);
 
                    // Group name changed? -> Start new group with new header
                    if ((!param.group_name() != !group_name) ||
                        (param.group_name() &&
                            (!group_name || strcmp(group_name, param.group_name()) != 0)))
                    {
                        ImGui::Separator();
                        if (param.group_name() != nullptr)
                            ImGui::Text("%s", param.group_name());
                        group_name = param.group_name();
                    }
 
                    // Choose proper edit control depending on the parameter kind
                    switch (param.kind()) {
                    case Param_info::PK_FLOAT:
                        changed |= ImGui::SliderFloat(
                            param.display_name(),
                            &param.data<float>(),
                            param.range_min(),
                            param.range_max());
                        break;
                    case Param_info::PK_FLOAT2:
                        changed |= ImGui::SliderFloat2(
                            param.display_name(),
                            &param.data<float>(),
                            param.range_min(),
                            param.range_max());
                        break;
                    case Param_info::PK_FLOAT3:
                        changed |= ImGui::SliderFloat3(
                            param.display_name(),
                            &param.data<float>(),
                            param.range_min(),
                            param.range_max());
                        break;
                    case Param_info::PK_COLOR:
                        changed |= ImGui::ColorEdit3(
                            param.display_name(),
                            &param.data<float>());
                        break;
                    case Param_info::PK_BOOL:
                        changed |= ImGui::Checkbox(
                            param.display_name(),
                            &param.data<bool>());
                        break;
                    case Param_info::PK_INT:
                        changed |= ImGui::SliderInt(
                            param.display_name(),
                            &param.data<int>(),
                            int(param.range_min()),
                            int(param.range_max()));
                        break;
                    case Param_info::PK_ARRAY:
                        {
                            ImGui::Text("%s", param.display_name());
                            ImGui::Indent(16.0f * options.gui_scale);
                            char *ptr = &param.data<char>();
                            for (mi::Size i = 0, n = param.array_size(); i < n; ++i) {
                                std::string idx_str = to_string(i);
                                switch (param.array_elem_kind()) {
                                case Param_info::PK_FLOAT:
                                    changed |= ImGui::SliderFloat(
                                        idx_str.c_str(),
                                        reinterpret_cast<float *>(ptr),
                                        param.range_min(),
                                        param.range_max());
                                    break;
                                case Param_info::PK_FLOAT2:
                                    changed |= ImGui::SliderFloat2(
                                        idx_str.c_str(),
                                        reinterpret_cast<float *>(ptr),
                                        param.range_min(),
                                        param.range_max());
                                    break;
                                case Param_info::PK_FLOAT3:
                                    changed |= ImGui::SliderFloat3(
                                        idx_str.c_str(),
                                        reinterpret_cast<float *>(ptr),
                                        param.range_min(),
                                        param.range_max());
                                    break;
                                case Param_info::PK_COLOR:
                                    changed |= ImGui::ColorEdit3(
                                        idx_str.c_str(),
                                        reinterpret_cast<float *>(ptr));
                                    break;
                                case Param_info::PK_BOOL:
                                    changed |= ImGui::Checkbox(
                                        param.display_name(),
                                        reinterpret_cast<bool *>(ptr));
                                    break;
                                case Param_info::PK_INT:
                                    changed |= ImGui::SliderInt(
                                        param.display_name(),
                                        reinterpret_cast<int *>(ptr),
                                        int(param.range_min()),
                                        int(param.range_max()));
                                    break;
                                default:
                                    assert(false && "Array element type invalid or unhandled.");
                                }
                                ptr += param.array_pitch();
                            }
                            ImGui::Unindent(16.0f * options.gui_scale);
                        }
                        break;
                    case Param_info::PK_ENUM:
                        {
                            int value = param.data<int>();
                            std::string curr_value;
 
                            const Enum_type_info *info = param.enum_info();
                            for (size_t i = 0, n = info->values.size(); i < n; ++i) {
                                if (info->values[i].value == value) {
                                    curr_value = info->values[i].name;
                                    break;
                                }
                            }
 
                            if (ImGui::BeginCombo(param.display_name(), curr_value.c_str())) {
                                for (size_t i = 0, n = info->values.size(); i < n; ++i) {
                                    const std::string &name = info->values[i].name;
                                    bool is_selected = (curr_value == name);
                                    if (ImGui::Selectable(
                                        info->values[i].name.c_str(), is_selected)) {
                                        param.data<int>() = info->values[i].value;
                                        changed = true;
                                    }
                                    if (is_selected)
                                        ImGui::SetItemDefaultFocus();
                                }
                                ImGui::EndCombo();
                            }
                        }
                        break;
                    case Param_info::PK_STRING:
                        {
                            std::vector<char> buf;
 
                            size_t max_len = constant_table.get_max_length();
                            max_len = max_len > 63 ? max_len + 1 : 64;
 
                            buf.resize(max_len);
 
                            // fill the current value
                            unsigned curr_index = param.data<unsigned>();
                            const char *opt = constant_table.get_string(curr_index);
                            strcpy(buf.data(), opt != nullptr ? opt : "");
 
                            if (ImGui::InputText(
                                param.display_name(),
                                buf.data(), buf.size(),
                                ImGuiInputTextFlags_EnterReturnsTrue))
                            {
                                unsigned id = constant_table.get_id_for_string(buf.data());
 
                                param.data<unsigned>() = id;
                                changed = true;
                            }
                        }
                        break;
                    case Param_info::PK_TEXTURE:
                        changed |= handle_resource(param, texture_table);
                        break;
                    case Param_info::PK_LIGHT_PROFILE:
                        changed |= handle_resource(param, lp_table);
                        break;
                    case Param_info::PK_BSDF_MEASUREMENT:
                        changed |= handle_resource(param, bm_table);
                        break;
                    case Param_info::PK_UNKNOWN:
                        break;
                    }
 
                    ImGui::PopID();
                }
 
                if (options.enable_derivatives) {
                    ImGui::Separator();
                    bool b = kernel_params.use_derivatives != 0;
                    if (ImGui::Checkbox("Use derivatives", &b)) {
                        kernel_params.iteration_start = 0;
                        kernel_params.use_derivatives = b;
                    }
                }
 
                ImGui::PopItemWidth();
                ImGui::End();
 
                // If any material argument changed, update the target argument block on the device
                if (changed) {
                    material_gpu_context.update_device_argument_block(
                        material_bundle[kernel_params.current_material].argument_block_index);
                    kernel_params.iteration_start = 0;
                }
 
                // Handle events
                Window_context *ctx =
                    static_cast<Window_context*>(glfwGetWindowUserPointer(window));
                if (ctx->save_result && !ImGui::GetIO().WantCaptureKeyboard) {
                    save_result(
                        accum_buffer,
                        width, height,
                        options.outputfile,
                        factory, image_api, mdl_impexp_api);
 
                    save_result(
                        aux_albedo_buffer,
                        width, height,
                        filename_base + "_albedo" + filename_ext,
                        factory, image_api, mdl_impexp_api);
 
                    save_result(
                        aux_normal_buffer,
                        width, height,
                        filename_base + "_normal" + filename_ext,
                        factory, image_api, mdl_impexp_api);
                }
                if (ctx->exposure_event && !ImGui::GetIO().WantCaptureKeyboard) {
                    kernel_params.exposure_scale = powf(2.0f, ctx->exposure);
                }
                if (ctx->key_event && !ImGui::GetIO().WantCaptureKeyboard) {
                    kernel_params.iteration_start = 0;
 
                    // Update change material
                    const unsigned num_materials = unsigned(material_bundle.size());
                    kernel_params.current_material = (kernel_params.current_material +
                        ctx->material_index_delta + num_materials) % num_materials;
                    ctx->material_index_delta = 0;
 
                    if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
                        kernel_params.geometry = GT_SPHERE;
                    else
                        kernel_params.geometry = GT_HAIR;
                }
                if (ctx->mouse_button - 1 == GLFW_MOUSE_BUTTON_LEFT) {
                    // Only accept button press when not hovering GUI window
                    if (ctx->mouse_button_action == GLFW_PRESS &&
                            !ImGui::GetIO().WantCaptureMouse) {
                        ctx->moving = true;
                        glfwGetCursorPos(window, &ctx->move_start_x, &ctx->move_start_y);
                    }
                    else
                        ctx->moving = false;
                }
                if (ctx->mouse_wheel_delta && !ImGui::GetIO().WantCaptureMouse) {
                    ctx->zoom += ctx->mouse_wheel_delta;
                }
                if (ctx->mouse_event && !ImGui::GetIO().WantCaptureMouse) {
                    kernel_params.iteration_start = 0;
 
                    // Update camera
                    phi -= ctx->move_dx * 0.001 * M_PI;
                    theta -= ctx->move_dy * 0.001 * M_PI;
                    theta = std::max(theta, 0.00 * M_PI);
                    theta = std::min(theta, 1.00 * M_PI);
                    ctx->move_dx = ctx->move_dy = 0.0;
 
                    update_camera(kernel_params, phi, theta, base_dist, ctx->zoom);
                }
 
                // Clear all events
                ctx->save_result = false;
                ctx->key_event = false;
                ctx->mouse_event = false;
                ctx->exposure_event = false;
                ctx->mouse_wheel_delta = 0;
                ctx->mouse_button = 0;
 
                auto t1 = std::chrono::steady_clock::now();
                state_update_time += t1 - t0;
                t0 = t1;
 
                launch_subframe(gl_display, width, height, cuda_function, &kernel_params);
 
                t1 = std::chrono::steady_clock::now();
                render_time += t1 - t0;
                t0 = t1;
 
                // Render GL buffer
                gl_display->update_display();
 
                t1 = std::chrono::steady_clock::now();
                display_time += t1 - t0;
 
                // Render stats window
                ImGui::SetNextWindowPos(ImVec2(10, 10));
                ImGui::Begin("##notitle", nullptr,
                    ImGuiWindowFlags_NoDecoration |
                    ImGuiWindowFlags_AlwaysAutoResize |
                    ImGuiWindowFlags_NoSavedSettings |
                    ImGuiWindowFlags_NoFocusOnAppearing |
                    ImGuiWindowFlags_NoNav);
 
                // Update stats only every 0.5s
                ++last_update_frames;
                if (t1 - last_update_time > update_min_interval || last_update_frames == 0) {
                    typedef std::chrono::duration<double, std::milli> durationMs;
 
                    snprintf(stats_text, sizeof(stats_text),
                        "%5.1f fps\n\n"
                        "state update: %8.1f ms\n"
                        "render:       %8.1f ms\n"
                        "display:      %8.1f ms\n",
                        last_update_frames / std::chrono::duration<double>(
                            t1 - last_update_time).count(),
                        (durationMs(state_update_time) / last_update_frames).count(),
                        (durationMs(render_time) / last_update_frames).count(),
                        (durationMs(display_time) / last_update_frames).count());
 
                    last_update_time = t1;
                    last_update_frames = 0;
                    state_update_time = render_time = display_time =
                        std::chrono::duration<double>::zero();
                }
 
                ImGui::TextUnformatted(stats_text);
                ImGui::End();
 
                // Show the GUI
                ImGui::Render();
                ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
 
                // Swap front and back buffers
                glfwSwapBuffers(window);
            }
        }
    }
 
    // Cleanup CUDA
    check_cuda_success(cudaDestroyTextureObject(kernel_params.env_tex));
    check_cuda_success(cudaFreeArray(env_tex_data));
    check_cuda_success(cuMemFree(env_accel));
    check_cuda_success(cuMemFree(accum_buffer));
    check_cuda_success(cuMemFree(aux_albedo_buffer));
    check_cuda_success(cuMemFree(aux_normal_buffer));
    check_cuda_success(cuMemFree(material_buffer));
    check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
    check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
    check_cuda_success(cuModuleUnload(cuda_module));
    uninit_cuda(cuda_context);
 
    // Cleanup OpenGL
    if (options.opengl) {
        delete gl_display;
        gl_display = nullptr;
        ImGui_ImplOpenGL3_Shutdown();
        ImGui_ImplGlfw_Shutdown();
        ImGui::DestroyContext();
        glfwDestroyWindow(window);
        glfwTerminate();
    }
}
 
// Returns true, if the string str starts with the given prefix, false otherwise.
bool starts_with(std::string const &str, std::string const &prefix)
{
    return str.size() >= prefix.size() && str.compare(0, prefix.size(), prefix) == 0;
}
 
// Create application material representation for use in our CUDA kernel
Df_cuda_material create_cuda_material(
    const mi::neuraylib::ICompiled_material* compiled_material,
    size_t target_code_index,
    size_t compiled_material_index,
    std::vector<mi::neuraylib::Target_function_description> const& descs,
    bool use_hair_bsdf)
{
    Df_cuda_material mat;
 
    // shared by all generated functions of the same material
    // used here to alter the materials parameter set
    mat.compiled_material_index = static_cast<unsigned int>(compiled_material_index);
 
    // Note: the same argument_block_index is filled into all function descriptions of a
    //       material, if any function uses it
    mat.argument_block_index = static_cast<unsigned int>(descs[0].argument_block_index);
 
    mat.init.x = static_cast<unsigned int>(target_code_index);
    mat.init.y = static_cast<unsigned int>(descs[0].function_index);
 
    if (!use_hair_bsdf)
    {
        // identify the BSDF function by target_code_index (i'th link unit)
        // and the function_index inside this target_code.
        // same for the EDF and the intensity expression.
 
        // has material a constant thin_walled property?
        mi::base::Handle<mi::neuraylib::IExpression const> thin_walled(
            compiled_material->lookup_sub_expression("thin_walled"));
 
        bool has_constant_thin_walled = false;
        bool is_thin_walled = false;
        if (thin_walled->get_kind() == mi::neuraylib::IExpression::EK_CONSTANT)
        {
            mi::base::Handle<mi::neuraylib::IExpression_constant const> thin_walled_const(
                thin_walled->get_interface<mi::neuraylib::IExpression_constant const>());
            mi::base::Handle<mi::neuraylib::IValue_bool const> thin_walled_bool(
                thin_walled_const->get_value<mi::neuraylib::IValue_bool>());
 
            has_constant_thin_walled = true;
            is_thin_walled = thin_walled_bool->get_value();
        }
 
        // back faces could be different for thin walled materials
        bool need_backface_bsdf = false;
        bool need_backface_edf = false;
        bool need_backface_emission_intensity = false;
 
        if (!has_constant_thin_walled || is_thin_walled)
        {
            // first, backfaces dfs are only considered for thin_walled materials
 
            // second, we only need to generate new code if surface and backface are different
            need_backface_bsdf =
                compiled_material->get_slot_hash(mi::neuraylib::SLOT_SURFACE_SCATTERING) !=
                compiled_material->get_slot_hash(mi::neuraylib::SLOT_BACKFACE_SCATTERING);
            need_backface_edf =
                compiled_material->get_slot_hash(mi::neuraylib::SLOT_SURFACE_EMISSION_EDF_EMISSION) !=
                compiled_material->get_slot_hash(mi::neuraylib::SLOT_BACKFACE_EMISSION_EDF_EMISSION);
            need_backface_emission_intensity =
                compiled_material->get_slot_hash(mi::neuraylib::SLOT_SURFACE_EMISSION_INTENSITY) !=
                compiled_material->get_slot_hash(mi::neuraylib::SLOT_BACKFACE_EMISSION_INTENSITY);
 
            // third, either the bsdf or the edf need to be non-default (black)
            mi::base::Handle<mi::neuraylib::IExpression const> scattering_expr(
                compiled_material->lookup_sub_expression("backface.scattering"));
            mi::base::Handle<mi::neuraylib::IExpression const> emission_expr(
                compiled_material->lookup_sub_expression("backface.emission.emission"));
 
            if (scattering_expr->get_kind() == mi::neuraylib::IExpression::EK_CONSTANT &&
                emission_expr->get_kind() == mi::neuraylib::IExpression::EK_CONSTANT)
            {
                mi::base::Handle<mi::neuraylib::IExpression_constant const> scattering_expr_constant(
                    scattering_expr->get_interface<mi::neuraylib::IExpression_constant>());
                mi::base::Handle<mi::neuraylib::IValue const> scattering_value(
                    scattering_expr_constant->get_value());
 
                mi::base::Handle<mi::neuraylib::IExpression_constant const> emission_expr_constant(
                    emission_expr->get_interface<mi::neuraylib::IExpression_constant>());
                mi::base::Handle<mi::neuraylib::IValue const> emission_value(
                    emission_expr_constant->get_value());
 
                if (scattering_value->get_kind() == mi::neuraylib::IValue::VK_INVALID_DF &&
                    emission_value->get_kind() == mi::neuraylib::IValue::VK_INVALID_DF)
                {
                    need_backface_bsdf = false;
                    need_backface_edf = false;
                    need_backface_emission_intensity = false;
                }
            }
        }
 
        mat.bsdf.x = static_cast<unsigned int>(target_code_index);
        mat.bsdf.y = static_cast<unsigned int>(descs[1].function_index);
 
        mat.edf.x = static_cast<unsigned int>(target_code_index);
        mat.edf.y = static_cast<unsigned int>(descs[2].function_index);
 
        mat.emission_intensity.x = static_cast<unsigned int>(target_code_index);
        mat.emission_intensity.y = static_cast<unsigned int>(descs[3].function_index);
 
        mat.backface_bsdf.x = static_cast<unsigned int>(target_code_index);
        mat.backface_bsdf.y = static_cast<unsigned int>(
            need_backface_bsdf ? descs[8].function_index : descs[1].function_index);
 
        mat.backface_edf.x = static_cast<unsigned int>(target_code_index);
        mat.backface_edf.y = static_cast<unsigned int>(
            need_backface_edf ? descs[9].function_index : descs[2].function_index);
 
        mat.backface_emission_intensity.x = static_cast<unsigned int>(target_code_index);
        mat.backface_emission_intensity.y = static_cast<unsigned int>(
            need_backface_emission_intensity ? descs[10].function_index : descs[3].function_index);
 
        mat.volume_absorption.x = static_cast<unsigned int>(target_code_index);
        mat.volume_absorption.y = static_cast<unsigned int>(descs[4].function_index);
 
        mat.thin_walled.x = static_cast<unsigned int>(target_code_index);
        mat.thin_walled.y = static_cast<unsigned int>(descs[5].function_index);
 
        mat.cutout_opacity.x = static_cast<unsigned int>(target_code_index);
        mat.cutout_opacity.y = static_cast<unsigned int>(descs[6].function_index);
    }
    else
    {
        mat.bsdf.x = static_cast<unsigned int>(target_code_index);
        mat.bsdf.y = static_cast<unsigned int>(descs[7].function_index);
        mat.contains_hair_bsdf = 1;
    }
 
    // init tag maps with zeros (optional)
    memset(mat.bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
    memset(mat.edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
    memset(mat.backface_bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
    memset(mat.backface_edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
    return mat;
}
 
void create_cuda_material_handles(
    Df_cuda_material& mat,
    const mi::neuraylib::ITarget_code* target_code,
    LPE_state_machine& lpe_state_machine)
{
    // fill tag ID list.
    // allows to map from local per material Tag IDs to global per scene Tag IDs
    // Note, calling 'LPE_state_machine::handle_to_global_tag(...)' registers the string handles
    // present in the MDL in our 'scene'
    mat.bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
        target_code->get_callable_function_df_handle_count(mat.bsdf.y));
    for (mi::Size i = 0; i < mat.bsdf_mtag_to_gtag_map_size; ++i)
        mat.bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
            target_code->get_callable_function_df_handle(mat.bsdf.y, i));
 
    // same for all other distribution functions
    mat.edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
        target_code->get_callable_function_df_handle_count(mat.edf.y));
    for (mi::Size i = 0; i < mat.edf_mtag_to_gtag_map_size; ++i)
        mat.edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
            target_code->get_callable_function_df_handle(mat.edf.y, i));
 
    mat.backface_bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
        target_code->get_callable_function_df_handle_count(mat.backface_bsdf.y));
    for (mi::Size i = 0; i < mat.backface_bsdf_mtag_to_gtag_map_size; ++i)
        mat.backface_bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
            target_code->get_callable_function_df_handle(mat.backface_bsdf.y, i));
 
    mat.backface_edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
        target_code->get_callable_function_df_handle_count(mat.backface_edf.y));
    for (mi::Size i = 0; i < mat.backface_edf_mtag_to_gtag_map_size; ++i)
        mat.backface_edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
            target_code->get_callable_function_df_handle(mat.backface_edf.y, i));
}
 
// checks if a compiled material contains none-invalid hair BSDF
bool contains_hair_bsdf(const mi::neuraylib::ICompiled_material* compiled_material)
{
    mi::base::Handle<const mi::neuraylib::IExpression_direct_call> body(
        compiled_material->get_body());
 
    mi::base::Handle<const mi::neuraylib::IExpression_list> body_args(body->get_arguments());
    for (mi::Size i = 0, n = body_args->get_size(); i < n; ++i)
    {
        const char* name = body_args->get_name(i);
        if (strcmp(name, "hair") == 0)
        {
            mi::base::Handle<const mi::neuraylib::IExpression> hair_exp(
                body_args->get_expression(i));
 
            if (hair_exp->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
                return true;
 
            mi::base::Handle<const mi::neuraylib::IExpression_constant> hair_exp_const(
                hair_exp->get_interface<const mi::neuraylib::IExpression_constant>());
 
            mi::base::Handle<const mi::neuraylib::IValue> hair_exp_const_value(
                hair_exp_const->get_value());
 
            return hair_exp_const_value->get_kind() != mi::neuraylib::IValue::VK_INVALID_DF;
        }
    }
    return true;
}
 
static void usage(const char *name)
{
    std::cout
        << "usage: " << name << " [options] [<material_name1|full_mdle_path1> ...]\n"
        << "-h|--help                   print this text and exit\n"
        << "-v|--version                print the MDL SDK version string and exit\n"
        << "--device <id>               run on CUDA device <id> (default: 0)\n"
        << "--nogl                      don't open interactive display\n"
        << "--nocc                      don't use class-compilation\n"
        << "--noaux                     don't generate code for albedo and normal buffers\n"
        << "--nopdf                     don't generate code for pdf\n"
        << "                            (for simplicity implies --noaux)\n"
        << "--an                        use adapt normal function\n"
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
        << "--interpreter               use the DF interpreter (if possible)\n"
#endif
        << "--gui_scale <factor>        GUI scaling factor (default: 1.0)\n"
        << "--res <res_x> <res_y>       resolution (default: 1024x1024)\n"
        << "--hdr <filename>            HDR environment map "
           "(default: nvidia/sdk_examples/resources/environment.hdr)\n"
        << "--hdr_rot <degrees>         rotation of the environment in degrees (default: 0.0)\n"
        << "-o <outputfile>             image file to write result to (default: output.exr).\n"
        << "                            With multiple materials \"-<material index>\" will be\n"
        << "                            added in front of the extension\n"
        << "--spp <num>                 samples per pixel, only active for --nogl (default: 4096)\n"
        << "--spi <num>                 samples per render call (default: 8)\n"
        << "-t <type>                   0: eval, 1: sample, 2: mis, 3: mis + pdf, 4: no env\n"
        << "                            (default: 2)\n"
        << "-e <exposure>               exposure for interactive display (default: 0.0)\n"
        << "-f <fov>                    the camera field of view in degree (default: 96.0)\n"
        << "--cam <x> <y> <z>           set the camera position (default 0 0 3).\n"
        << "                            The camera will always look towards (0, 0, 0).\n"
        << "-l <x> <y> <z> <r> <g> <b>  add an isotropic point light with given coordinates and "
           "intensity (flux)\n"
        << "-p|--mdl_path <path>        MDL search path, can occur multiple times.\n"
        << "--mdl_next                  Enable features from upcoming MDL version.\n"
        << "--max_path_length <num>     maximum path length, default 4 (up to one total internal\n"
        << "                            reflection), clamped to 2..100\n"
        << "--noaa                      disable pixel oversampling\n"
        << "-d                          enable use of derivatives\n"
        << "--fold_ternary_on_df        fold all ternary operators on *df types (default: false)\n"
        << "--allowed_scatter_mode <m>  limits the allowed scatter mode to \"none\", \"reflect\", "
        << "\"transmit\" or \"reflect_and_transmit\" (default: restriction disabled)\n"
        << "\n"
        << "Note: material names can end with an '*' as a wildcard\n"
        << "      and alternatively, full MDLE file paths can be passed as material name\n";
 
    exit(EXIT_FAILURE);
}
 
int MAIN_UTF8(int argc, char* argv[])
{
    // Parse commandline options
    Options options;
    mi::examples::mdl::Configure_options configure_options;
    bool print_version_and_exit = false;
 
    for (int i = 1; i < argc; ++i) {
        const char *opt = argv[i];
        if (opt[0] == '-') {
            if (strcmp(opt, "--device") == 0 && i < argc - 1) {
                options.cuda_device = atoi(argv[++i]);
            } else if (strcmp(opt, "--nogl") == 0) {
                options.opengl = false;
            } else if (strcmp(opt, "--nocc") == 0) {
                options.use_class_compilation = false;
            } else if (strcmp(opt, "--noaux") == 0) {
                options.enable_auxiliary_output = false;
            } else if (strcmp(opt, "--nopdf") == 0) {
                options.enable_pdf = false;
                // also disable aux to avoid function indices depending on command line options
                options.enable_auxiliary_output = false;
            } else if (strcmp(opt, "--an") == 0) {
                options.use_adapt_normal = false;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
            } else if (strcmp(opt, "--interpreter") == 0) {
                options.use_df_interpreter = true;
#endif
            } else if (strcmp(opt, "--gui_scale") == 0 && i < argc - 1) {
                options.gui_scale = static_cast<float>(atof(argv[++i]));
            } else if (strcmp(opt, "--res") == 0 && i < argc - 2) {
                options.res_x = std::max(atoi(argv[++i]), 1);
                options.res_y = std::max(atoi(argv[++i]), 1);
            } else if (strcmp(opt, "--hdr") == 0 && i < argc - 1) {
                options.hdrfile = argv[++i];
            } else if (strcmp(opt, "--hdr_rot") == 0 && i < argc - 1) {
                options.hdr_rot = static_cast<float>(atof(argv[++i]));
                // wrap in case of negative input
                // we don't want fmodf behavior for negative values
                options.hdr_rot -= floorf(options.hdr_rot / 360.0f) * 360.f;
                options.hdr_rot = fmodf(options.hdr_rot, 360.0f);
            } else if (strcmp(opt, "-o") == 0 && i < argc - 1) {
                options.outputfile = argv[++i];
            } else if (strcmp(opt, "--spp") == 0 && i < argc - 1) {
                options.iterations = std::max(atoi(argv[++i]), 1);
            } else if (strcmp(opt, "--spi") == 0 && i < argc - 1) {
                options.samples_per_iteration = std::max(atoi(argv[++i]), 1);
            } else if (strcmp(opt, "-t") == 0 && i < argc - 1) {
                const int type = atoi(argv[++i]);
                if (type < 0 || type >= MDL_TEST_COUNT) {
                    std::cout << "Invalid type for \"-t\" option!" << std::endl;
                    usage(argv[0]);
                }
                options.mdl_test_type = type;
            } else if (strcmp(opt, "-e") == 0 && i < argc - 1) {
                options.exposure = static_cast<float>(atof(argv[++i]));
            } else if (strcmp(opt, "-f") == 0 && i < argc - 1) {
                options.fov = static_cast<float>(atof(argv[++i]));
            } else if (strcmp(opt, "--cam") == 0 && i < argc - 3) {
                options.cam_pos.x = static_cast<float>(atof(argv[++i]));
                options.cam_pos.y = static_cast<float>(atof(argv[++i]));
                options.cam_pos.z = static_cast<float>(atof(argv[++i]));
            } else if (strcmp(opt, "-l") == 0 && i < argc - 6) {
                options.light_pos.x = static_cast<float>(atof(argv[++i]));
                options.light_pos.y = static_cast<float>(atof(argv[++i]));
                options.light_pos.z = static_cast<float>(atof(argv[++i]));
                options.light_intensity.x = static_cast<float>(atof(argv[++i]));
                options.light_intensity.y = static_cast<float>(atof(argv[++i]));
                options.light_intensity.z = static_cast<float>(atof(argv[++i]));
            } else if ((strcmp(opt, "-p") == 0 || strcmp(opt, "--mdl_path") == 0) && i < argc - 1) {
                configure_options.additional_mdl_paths.push_back(argv[++i]);
            } else if (strcmp(opt, "--mdl_next") == 0) {
                options.enable_mdl_next = true;
            } else if (strcmp(opt, "--max_path_length") == 0 && i < argc - 1) {
                options.max_path_length = std::min(std::max(atoi(argv[++i]), 2), 100);
            } else if (strcmp(opt, "--noaa") == 0) {
                options.no_aa = true;
            } else if (strcmp(opt, "-d") == 0) {
                options.enable_derivatives = true;
            } else if (strcmp(opt, "--fold_ternary_on_df") == 0) {
                options.fold_ternary_on_df = true;
            } else if (strcmp(opt, "--allowed_scatter_mode") == 0 && i < argc - 1) {
                options.enable_bsdf_flags = true;
                char const *mode = argv[++i];
                if (strcmp(mode, "none") == 0) {
                    options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_NONE;
                } else if (strcmp(mode, "reflect") == 0) {
                    options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_ALLOW_REFLECT;
                } else if (strcmp(mode, "transmit") == 0) {
                    options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_ALLOW_TRANSMIT;
                } else if (strcmp(mode, "reflect_and_transmit") == 0) {
                    options.allowed_scatter_mode =
                        mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT;
                } else {
                    std::cout << "Unknown allowed_scatter_mode: \"" << mode << "\"" << std::endl;
                    usage(argv[0]);
                }
            } else if (strcmp(opt, "-v") == 0 || strcmp(opt, "--version") == 0) {
                print_version_and_exit = true;
            } else {
                if (strcmp(opt, "-h") != 0 && strcmp(opt, "--help") != 0)
                    std::cout << "Unknown option: \"" << opt << "\"" << std::endl;
                usage(argv[0]);
            }
        }
        else
            options.material_names.push_back(std::string(opt));
    }
 
    if (options.mdl_test_type == 3 && !options.enable_pdf)
        exit_failure("Cannot use \"mis + pdf\" test type when pdf is disabled.");
 
    // Use default material, if none was provided via command line
    if (options.material_names.empty())
        options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_df");
 
    // Access the MDL SDK
    mi::base::Handle<mi::neuraylib::INeuray> neuray(mi::examples::mdl::load_and_get_ineuray());
    if (!neuray.is_valid_interface())
        exit_failure("Failed to load the SDK.");
 
    // Handle the --version flag
    if (print_version_and_exit) {
 
        // print library version information.
        mi::base::Handle<const mi::neuraylib::IVersion> version(
            neuray->get_api_component<const mi::neuraylib::IVersion>());
        fprintf(stdout, "%s\n", version->get_string());
 
        // free the handles and unload the MDL SDK
        version = nullptr;
        neuray = nullptr;
        if (!mi::examples::mdl::unload())
            exit_failure("Failed to unload the SDK.");
 
        exit_success();
    }
 
    // Configure the MDL SDK
    if (!mi::examples::mdl::configure(neuray.get(), configure_options))
        exit_failure("Failed to initialize the SDK.");
 
    // Start the MDL SDK
    mi::Sint32 ret = neuray->start();
    if (ret != 0)
        exit_failure("Failed to initialize the SDK. Result code: %d", ret);
 
    // LPE state machine for rendering into multiple buffers
    LPE_state_machine lpe_state_machine;
    lpe_state_machine.handle_to_global_tag("point_light");  // register handles before building
    lpe_state_machine.handle_to_global_tag("env");          // the state machine
 
     // register other handles in the scene, e.g.: for object instances
    lpe_state_machine.handle_to_global_tag("sphere");       // for illustration, not used currently
 
    // Add some common and custom LPEs
    lpe_state_machine.add_expression("Beauty", LPE::create_common(LPE::Common::Beauty));
 
    lpe_state_machine.add_expression("Diffuse", LPE::create_common(LPE::Common::Diffuse));
    lpe_state_machine.add_expression("Glossy", LPE::create_common(LPE::Common::Glossy));
    lpe_state_machine.add_expression("Specular", LPE::create_common(LPE::Common::Specular));
    lpe_state_machine.add_expression("SSS", LPE::create_common(LPE::Common::SSS));
    lpe_state_machine.add_expression("Transmission", LPE::create_common(LPE::Common::Transmission));
 
    lpe_state_machine.add_expression("Beauty-Env", LPE::sequence({
        LPE::camera(),
        LPE::zero_or_more(LPE::any_scatter()),
        LPE::light("env") }));  // only light with the name 'env'
 
    lpe_state_machine.add_expression("Beauty-PointLight", LPE::sequence({
        LPE::camera(),
        LPE::zero_or_more(LPE::any_scatter()),
        LPE::light("point_light") })); // only light with the name 'point_light'
 
    lpe_state_machine.add_expression("Beauty-Emission", LPE::sequence({
        LPE::camera(),
        LPE::zero_or_more(LPE::any_scatter()),
        LPE::emission() })); // only emission
 
 
    lpe_state_machine.add_expression("Beauty-Base", LPE::sequence({
        LPE::camera(),
        LPE::zero_or_more(LPE::any_scatter("base")),
        LPE::light()})); // no emission
 
    lpe_state_machine.add_expression("Beauty-Coat", LPE::sequence({
        LPE::camera(),
        LPE::zero_or_more(LPE::any_scatter("coat")),
        LPE::light()})); // no emission
 
    lpe_state_machine.add_expression("Beauty-^Coat", LPE::sequence({
        LPE::camera(),
        LPE::zero_or_more(LPE::any_scatter("coat", false)),
        LPE::any_light()})); // emission or light source
 
    {
        // Create a transaction
        mi::base::Handle<mi::neuraylib::IDatabase> database(
            neuray->get_api_component<mi::neuraylib::IDatabase>());
        mi::base::Handle<mi::neuraylib::IScope> scope(database->get_global_scope());
        mi::base::Handle<mi::neuraylib::ITransaction> transaction(scope->create_transaction());
 
        // Access needed API components
        mi::base::Handle<mi::neuraylib::IMdl_factory> mdl_factory(
            neuray->get_api_component<mi::neuraylib::IMdl_factory>());
        mi::base::Handle<mi::neuraylib::IMdl_impexp_api> mdl_impexp_api(
            neuray->get_api_component<mi::neuraylib::IMdl_impexp_api>());
 
        mi::base::Handle<mi::neuraylib::IMdl_backend_api> mdl_backend_api(
            neuray->get_api_component<mi::neuraylib::IMdl_backend_api>());
        {
            // Initialize the material compiler with 16 result buffer slots ("texture results")
            Material_compiler mc(
                mdl_impexp_api.get(),
                mdl_backend_api.get(),
                mdl_factory.get(),
                transaction.get(),
                16,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
                options.use_df_interpreter,
#endif
                options.enable_derivatives,
                options.fold_ternary_on_df,
                options.enable_auxiliary_output,
                options.enable_pdf,
                options.use_adapt_normal,
                options.enable_bsdf_flags,
                options.enable_mdl_next,
                /*df_handle_mode=*/ "pointer",
                /*lambda_return_mode=*/ "value");
 
            // List of materials in the scene
            std::vector<Df_cuda_material> material_bundle;
 
            // Select the functions to translate
            std::vector<mi::neuraylib::Target_function_description> descs;
            descs.push_back(
                mi::neuraylib::Target_function_description("init"));
            descs.push_back(
                mi::neuraylib::Target_function_description("surface.scattering"));
            descs.push_back(
                mi::neuraylib::Target_function_description("surface.emission.emission"));
            descs.push_back(
                mi::neuraylib::Target_function_description("surface.emission.intensity"));
            descs.push_back(
                mi::neuraylib::Target_function_description("volume.absorption_coefficient"));
            descs.push_back(
                mi::neuraylib::Target_function_description("thin_walled"));
            descs.push_back(
                mi::neuraylib::Target_function_description("geometry.cutout_opacity"));
            descs.push_back(
                mi::neuraylib::Target_function_description("hair"));
            descs.push_back(
                mi::neuraylib::Target_function_description("backface.scattering"));
            descs.push_back(
                mi::neuraylib::Target_function_description("backface.emission.emission"));
            descs.push_back(
                mi::neuraylib::Target_function_description("backface.emission.intensity"));
 
            Timing timing_compile("Compile MDL to PTX");
 
            // Generate code for all materials
            std::vector<std::string> used_material_names;
            for (size_t i = 0; i < options.material_names.size(); ++i) {
                std::string& opt_material_name = options.material_names[i];
 
                // split module and material name
                std::string module_qualified_name, material_simple_name;
                if (!mi::examples::mdl::parse_cmd_argument_material_name(
                    opt_material_name, module_qualified_name, material_simple_name, true))
                        exit_failure("Provided material name '%s' is invalid.",
                            opt_material_name.c_str());
 
                // Is this a material name pattern? (not applicable to mdle)
                if (!mi::examples::strings::ends_with(module_qualified_name, ".mdle") &&
                    opt_material_name.size() > 1 &&
                    opt_material_name.back() == '*') {
 
                    // prepare the pattern for matching
                    std::string pattern = opt_material_name.substr(0, opt_material_name.size() - 1);
                    if (!starts_with(pattern, "::"))
                        pattern = "::" + pattern;
 
                    // load the module
                    std::string module_db_name = mc.load_module(module_qualified_name);
 
                    // iterate over all materials in that module
                    mi::base::Handle<const mi::neuraylib::IModule> loaded_module(
                        transaction->access<const mi::neuraylib::IModule>(module_db_name.c_str()));
 
                    for (mi::Size j = 0, n = loaded_module->get_material_count(); j < n; ++j) {
 
                        // get the j`th material
                        const char* material_db_name = loaded_module->get_material(j);
                        mi::base::Handle<const mi::neuraylib::IFunction_definition> mat_def(
                            transaction->access<const mi::neuraylib::IFunction_definition>(
                                material_db_name));
 
                        // make sure the material name starts with the pattern
                        std::string material_qualified_name = mat_def->get_mdl_name();
                        if (!mi::examples::strings::starts_with(material_qualified_name, pattern))
                            continue;
 
                        std::cout << "Adding material \"" << material_qualified_name << "\"" << std::endl;
 
                        // Add functions of the material to the link unit
                        check_success(mc.add_material(
                            module_qualified_name, material_db_name,
                            descs.data(), descs.size(),
                            options.use_class_compilation));
 
                        mi::base::Handle<const mi::neuraylib::ICompiled_material> compiled_material(
                            mc.get_compiled_materials().back());
 
                        // Create application material representation
                        material_bundle.push_back(create_cuda_material(
                            compiled_material.get(),
                            0, material_bundle.size(), descs,
                            contains_hair_bsdf(compiled_material.get())));
                        used_material_names.push_back(material_qualified_name);
                    }
                } else {
                    // Load the module
                    std::string module_db_name = mc.load_module(module_qualified_name);
                    mi::base::Handle<const mi::neuraylib::IModule> module(
                        transaction->access<mi::neuraylib::IModule>(module_db_name.c_str()));
                    if (!module)
                        exit_failure("Failed to access the loaded module.");
 
                    // Construct the material name
                    std::string material_db_name
                        = module_db_name + "::" + material_simple_name;
                    material_db_name = mi::examples::mdl::add_missing_material_signature(
                        module.get(), material_db_name);
                    if (material_db_name.empty())
                        exit_failure("Failed to find the material %s in the module %s.",
                            material_simple_name.c_str(), module_qualified_name.c_str());
 
                    std::cout << "Adding material \"" << material_db_name << "\"" << std::endl;
 
                    // Add functions of the material to the link unit
                    check_success(mc.add_material(
                        module_qualified_name, material_db_name,
                        descs.data(), descs.size(),
                        options.use_class_compilation));
 
                    mi::base::Handle<const mi::neuraylib::ICompiled_material> compiled_material(
                        mc.get_compiled_materials().back());
 
                    // Create application material representation
                    material_bundle.push_back(create_cuda_material(
                        compiled_material.get(),
                        0, material_bundle.size(), descs,
                        contains_hair_bsdf(compiled_material.get())));
                    used_material_names.push_back(material_db_name);
                }
            }
 
            // Update the material names with the actually used names
            options.material_names = used_material_names;
 
            // Generate the CUDA PTX code for the link unit
            mi::base::Handle<const mi::neuraylib::ITarget_code> target_code(
                mc.generate_cuda_ptx());
 
            timing_compile.stop();
 
            // convert handles to tag IDs
            for (auto& mat : material_bundle)
                create_cuda_material_handles(mat, target_code.get(), lpe_state_machine);
 
            mi::base::Handle<mi::neuraylib::IFactory> factory(
                neuray->get_api_component<mi::neuraylib::IFactory>());
 
            // Acquire image API needed to prepare the textures
            mi::base::Handle<mi::neuraylib::IImage_api> image_api(
                neuray->get_api_component<mi::neuraylib::IImage_api>());
 
            // when all scene elements that have handles are loaded and all handles as well as
            // light path expressions are registered, the state machine can be constructed.
            lpe_state_machine.build();
 
            // Render
            render_scene(
                options,
                transaction,
                factory,
                image_api,
                mdl_impexp_api,
                target_code,
                mc.get_material_defs(),
                mc.get_compiled_materials(),
                mc.get_argument_block_indices(),
                material_bundle,
                lpe_state_machine);
        }
 
        transaction->commit();
    }
 
    // Shut down the MDL SDK
    if (neuray->shutdown() != 0)
        exit_failure("Failed to shutdown the SDK.");
 
    // Unload the MDL SDK
    neuray = nullptr;
    if (!mi::examples::mdl::unload())
        exit_failure("Failed to unload the SDK.");
 
    exit_success();
}
 
// Convert command line arguments to UTF8 on Windows
COMMANDLINE_TO_UTF8

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.h

/******************************************************************************
 * Copyright 2024 NVIDIA Corporation. All rights reserved.
 *****************************************************************************/
 
#ifndef EXAMPLE_DF_CUDA_H
#define EXAMPLE_DF_CUDA_H
 
#include <cstdint>
#include <vector_types.h>
#include <texture_types.h>
#include <mi/neuraylib/target_code_types.h>
 
struct Target_code_data;
 
enum Mdl_test_type {
    MDL_TEST_EVAL    = 0,  // only use BSDF evaluation
    MDL_TEST_SAMPLE  = 1,  // only use BSDF sampling
    MDL_TEST_MIS     = 2,  // multiple importance sampling
    MDL_TEST_MIS_PDF = 3,  // multiple importance sampling, but use BSDF explicit pdf computation
    MDL_TEST_NO_ENV  = 4,  // no environment sampling
    MDL_TEST_COUNT
};
 
const unsigned MAX_DF_HANDLES = 8;
 
struct Env_accel {
    unsigned int alias;
    float q;
    float pdf;
};
 
namespace
{
    #if defined(__CUDA_ARCH__)
    __host__ __device__
    #endif
    inline uint2 make_invalid()
    {
        uint2 index_pair;
        index_pair.x = ~0;
        index_pair.y = ~0;
        return index_pair;
    }
}
 
struct Df_cuda_material
{
    #if defined(__CUDA_ARCH__)
    __host__ __device__
    #endif
    Df_cuda_material()
        : compiled_material_index(0)
        , argument_block_index(~0)
        , init(make_invalid())
        , bsdf(make_invalid())
        , edf(make_invalid())
        , emission_intensity(make_invalid())
        , backface_bsdf(make_invalid())
        , backface_edf(make_invalid())
        , backface_emission_intensity(make_invalid())
        , volume_absorption(make_invalid())
        , thin_walled(make_invalid())
        , cutout_opacity(make_invalid())
        , contains_hair_bsdf(0)
    {
    }
 
    // used on host side only
    unsigned int compiled_material_index;
 
    // the argument block index of this material (~0 if not used)
    unsigned int argument_block_index;
 
    // pair of target_code_index and function_index to identify the init function
    uint2 init;
 
    // pair of target_code_index and function_index to identify the bsdf
    uint2 bsdf;
 
    // pair of target_code_index and function_index to identify the edf
    uint2 edf;
 
    // pair of target_code_index and function_index for intensity
    uint2 emission_intensity;
 
    // pair of target_code_index and function_index to identify the bsdf
    uint2 backface_bsdf;
 
    // pair of target_code_index and function_index to identify the edf
    uint2 backface_edf;
 
    // pair of target_code_index and function_index for intensity
    uint2 backface_emission_intensity;
 
    // pair of target_code_index and function_index for volume absorption
    uint2 volume_absorption;
 
    // pair of target_code_index and function_index for thin_walled
    uint2 thin_walled;
 
    // pair of target_code_index and function_index for cutout_opacity
    uint2 cutout_opacity;
 
    // maps 'material tags' to 'global tags' for the surface scattering distribution function
    unsigned int bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
    unsigned int bsdf_mtag_to_gtag_map_size;
 
    // maps 'material tags' to 'global tags' for the emission distribution function
    unsigned int edf_mtag_to_gtag_map[MAX_DF_HANDLES];
    unsigned int edf_mtag_to_gtag_map_size;
 
    // maps 'material tags' to 'global tags' for the backface scattering distribution function
    unsigned int backface_bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
    unsigned int backface_bsdf_mtag_to_gtag_map_size;
 
    // maps 'material tags' to 'global tags' for the backface emission distribution function
    unsigned int backface_edf_mtag_to_gtag_map[MAX_DF_HANDLES];
    unsigned int backface_edf_mtag_to_gtag_map_size;
 
    unsigned int contains_hair_bsdf;
};
 
enum Geometry_type
{
    GT_SPHERE = 0,  // Intersect a sphere with unit radius located at the (0,0,0)
    GT_HAIR = 1,    // Intersect an infinite cylinder at (0,0,0) oriented in y-direction
};
 
struct Kernel_params {
    // display
    uint2         resolution;
    float         exposure_scale;
    unsigned int *display_buffer;
    float3       *accum_buffer;
    float3       *albedo_buffer;
    float3       *normal_buffer;
    bool          enable_auxiliary_output;
    unsigned      display_buffer_index;
 
    // parameters
    unsigned int            iteration_start;
    unsigned int            iteration_num;
    unsigned int            mdl_test_type;
    unsigned int            max_path_length;
    unsigned int            use_derivatives;
    unsigned int            disable_aa;
    mi::neuraylib::Df_flags bsdf_data_flags;
 
    // camera
    float3 cam_pos;
    float3 cam_dir;
    float3 cam_right;
    float3 cam_up;
    float  cam_focal;
 
    // geometry
    unsigned int geometry;
 
    // environment
    uint2                env_size;
    cudaTextureObject_t  env_tex;
    Env_accel           *env_accel;
    float                env_intensity;         // scaling factor
    uint32_t             env_gtag;              // global light group tag for handle 'env'
    float                env_rotation;          // rotation of the environment
 
    // point light
    float3 light_pos;
    float3 light_color;
    float light_intensity;
    uint32_t point_light_gtag;                  // global light group tag for handle 'point_light'
 
    // material data
    Target_code_data   *tc_data;
    char const        **arg_block_list;
    unsigned int        current_material;
    Df_cuda_material   *material_buffer;
 
    // LPE state machine
    uint32_t            lpe_num_states;         // number of states in the state machine
    uint32_t            lpe_num_transitions;    // number of possible transitions between 2 states
    uint32_t           *lpe_state_table;        // actual machine; size: #states x #transitions
    uint32_t           *lpe_final_mask;         // encodes final states; size: #states
    uint32_t            default_gtag;           // tag ID for the empty string
    uint32_t            lpe_ouput_expression;   // the LPE evaluated for output
                                                // only one here, but additional one analogously
};
 
enum Display_buffer_options
{
    DISPLAY_BUFFER_LPE = 0,
    DISPLAY_BUFFER_ALBEDO,
    DISPLAY_BUFFER_NORMAL,
 
    DISPLAY_BUFFER_COUNT
};
 
#endif // EXAMPLE_DF_CUDA_H

Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h

/******************************************************************************
 * Copyright 2024 NVIDIA Corporation. All rights reserved.
 *****************************************************************************/
 
// examples/mdl_sdk/shared/texture_support_cuda.h
//
// This file contains the implementations and the vtables of the texture access functions.
 
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
 
#include <cuda.h>
#include <cuda_runtime.h>
#include <math.h>
 
#include <mi/neuraylib/target_code_types.h>
 
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
    #define M_PI            3.14159265358979323846
#endif
#define M_ONE_OVER_PI       0.318309886183790671538
 
 
typedef mi::neuraylib::tct_deriv_float                     tct_deriv_float;
typedef mi::neuraylib::tct_deriv_float2                    tct_deriv_float2;
typedef mi::neuraylib::tct_deriv_arr_float_2               tct_deriv_arr_float_2;
typedef mi::neuraylib::tct_deriv_arr_float_3               tct_deriv_arr_float_3;
typedef mi::neuraylib::tct_deriv_arr_float_4               tct_deriv_arr_float_4;
typedef mi::neuraylib::Shading_state_material_with_derivs  Shading_state_material_with_derivs;
typedef mi::neuraylib::Shading_state_material              Shading_state_material;
typedef mi::neuraylib::Texture_handler_base                Texture_handler_base;
typedef mi::neuraylib::Tex_wrap_mode                       Tex_wrap_mode;
typedef mi::neuraylib::Mbsdf_part                          Mbsdf_part;
 
 
// Custom structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
    explicit Texture()
        : filtered_object(0)
        , unfiltered_object(0)
        , size(make_uint3(0, 0, 0))
        , inv_size(make_float3(0.0f, 0.0f, 0.0f))
    {}
 
    explicit Texture(
        cudaTextureObject_t  filtered_object,
        cudaTextureObject_t  unfiltered_object,
        uint3                size)
        : filtered_object(filtered_object)
        , unfiltered_object(unfiltered_object)
        , size(size)
        , inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
    {}
 
    cudaTextureObject_t  filtered_object;    // uses filter mode cudaFilterModeLinear
    cudaTextureObject_t  unfiltered_object;  // uses filter mode cudaFilterModePoint
    uint3                size;               // size of the texture, needed for texel access
    float3               inv_size;           // the inverse values of the size of the texture
};
 
// Custom structure representing an MDL BSDF measurement.
struct Mbsdf
{
    unsigned            has_data[2];            // true if there is a measurement for this part
    cudaTextureObject_t eval_data[2];           // uses filter mode cudaFilterModeLinear
    float               max_albedo[2];          // max albedo used to limit the multiplier
    float*              sample_data[2];         // CDFs for sampling a BSDF measurement
    float*              albedo_data[2];         // max albedo for each theta (isotropic)
 
    uint2           angular_resolution[2];      // size of the dataset, needed for texel access
    float2          inv_angular_resolution[2];  // the inverse values of the size of the dataset
    unsigned        num_channels[2];            // number of color channels (1 or 3)
};
 
// Structure representing a Light Profile
struct Lightprofile
{
    explicit Lightprofile()
        : angular_resolution(make_uint2(0, 0))
        , inv_angular_resolution(make_float2(0.0f, 0.0f))
        , theta_phi_start(make_float2(0.0f, 0.0f))
        , theta_phi_delta(make_float2(0.0f, 0.0f))
        , theta_phi_inv_delta(make_float2(0.0f, 0.0f))
        , candela_multiplier(0.0f)
        , total_power(0.0f)
        , eval_data(0)
    {
    }
 
    uint2           angular_resolution;     // angular resolution of the grid
    float2          inv_angular_resolution; // inverse angular resolution of the grid
    float2          theta_phi_start;        // start of the grid
    float2          theta_phi_delta;        // angular step size
    float2          theta_phi_inv_delta;    // inverse step size
    float           candela_multiplier;     // factor to rescale the normalized data
    float           total_power;
 
    cudaTextureObject_t eval_data;          // normalized data sampled on grid
    float*              cdf_data;           // CDFs for sampling a light profile
};
 
 
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler : Texture_handler_base {
    // additional data for the texture access functions can be provided here
 
    size_t         num_textures;        // the number of textures used by the material
                                        // (without the invalid texture)
    Texture const *textures;            // the textures used by the material
                                        // (without the invalid texture)
 
    size_t         num_mbsdfs;          // the number of mbsdfs used by the material
                                        // (without the invalid mbsdf)
    Mbsdf const    *mbsdfs;             // the mbsdfs used by the material
                                        // (without the invalid mbsdf)
 
    size_t          num_lightprofiles;  // number of elements in the lightprofiles field
                                        // (without the invalid light profile)
    Lightprofile const *lightprofiles;  // a device pointer to a list of mbsdfs objects, if used
                                        // (without the invalid light profile)
};
 
// The texture handler structure required by the MDL SDK with custom additional fields.
struct Texture_handler_deriv : mi::neuraylib::Texture_handler_deriv_base {
    // additional data for the texture access functions can be provided here
 
    size_t         num_textures;        // the number of textures used by the material
                                        // (without the invalid texture)
    Texture const *textures;            // the textures used by the material
                                        // (without the invalid texture)
 
    size_t         num_mbsdfs;          // the number of mbsdfs used by the material
                                        // (without the invalid texture)
    Mbsdf const    *mbsdfs;             // the mbsdfs used by the material
                                        // (without the invalid texture)
 
    size_t          num_lightprofiles;  // number of elements in the lightprofiles field
                                        // (without the invalid light profile)
    Lightprofile const *lightprofiles;  // a device pointer to a list of mbsdfs objects, if used
                                        // (without the invalid light profile)
};
 
 
#if defined(__CUDACC__)
 
// Stores a float4 in a float[4] array.
__device__ inline void store_result4(float res[4], const float4 &v)
{
    res[0] = v.x;
    res[1] = v.y;
    res[2] = v.z;
    res[3] = v.w;
}
 
// Stores a float in all elements of a float[4] array.
__device__ inline void store_result4(float res[4], float s)
{
    res[0] = res[1] = res[2] = res[3] = s;
}
 
// Stores the given float values in a float[4] array.
__device__ inline void store_result4(
    float res[4], float v0, float v1, float v2, float v3)
{
    res[0] = v0;
    res[1] = v1;
    res[2] = v2;
    res[3] = v3;
}
 
// Stores a float3 in a float[3] array.
__device__ inline void store_result3(float res[3], float3 const&v)
{
    res[0] = v.x;
    res[1] = v.y;
    res[2] = v.z;
}
 
// Stores a float4 in a float[3] array, ignoring v.w.
__device__ inline void store_result3(float res[3], const float4 &v)
{
    res[0] = v.x;
    res[1] = v.y;
    res[2] = v.z;
}
 
// Stores a float in all elements of a float[3] array.
__device__ inline void store_result3(float res[3], float s)
{
    res[0] = res[1] = res[2] = s;
}
 
// Stores the given float values in a float[3] array.
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
    res[0] = v0;
    res[1] = v1;
    res[2] = v2;
}
 
 
// ------------------------------------------------------------------------------------------------
// Textures
// ------------------------------------------------------------------------------------------------
 
// Applies wrapping and cropping to the given coordinate.
// Note: This macro returns if wrap mode is clip and the coordinate is out of range.
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func)    \
  do {                                                                                       \
    if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT &&                                    \
        (crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) {                                 \
      /* Do nothing, use texture sampler default behavior */                                 \
    }                                                                                        \
    else                                                                                     \
    {                                                                                        \
      if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT )                                   \
        val = val - floorf(val);                                                             \
      else {                                                                                 \
        if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) {  \
          store_res_func(result, 0.0f);                                                      \
          return;                                                                            \
        }                                                                                    \
        else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) {                 \
          float floored_val = floorf(val);                                                   \
          if ( (int(floored_val) & 1) != 0 )                                                 \
            val = 1.0f - (val - floored_val);                                                \
          else                                                                               \
            val = val - floored_val;                                                         \
        }                                                                                    \
        float inv_hdim = 0.5f * (inv_dim);                                                   \
        val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim);                                   \
      }                                                                                      \
      val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0];                        \
    }                                                                                        \
  } while ( 0 )
 
 
#ifdef USE_SMOOTHERSTEP_FILTER
// Modify texture coordinates to get better texture filtering,
// see http://www.iquilezles.org/www/articles/texture/texture.htm
#define APPLY_SMOOTHERSTEP_FILTER()                                                         \
    do {                                                                                    \
        u = u * tex.size.x + 0.5f;                                                          \
        v = v * tex.size.y + 0.5f;                                                          \
                                                                                            \
        float u_i = floorf(u), v_i = floorf(v);                                             \
        float u_f = u - u_i;                                                                \
        float v_f = v - v_i;                                                                \
        u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f);                          \
        v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f);                          \
        u = u_i + u_f;                                                                      \
        v = v_i + v_f;                                                                      \
                                                                                            \
        u = (u - 0.5f) * tex.inv_size.x;                                                    \
        v = (v - 0.5f) * tex.inv_size.y;                                                    \
    } while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
 
 
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float4_2d(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float const                 coord[2],
    Tex_wrap_mode const         wrap_u,
    Tex_wrap_mode const         wrap_v,
    float const                 crop_u[2],
    float const                 crop_v[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result4(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
    float u = coord[0], v = coord[1];
    WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
    WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
 
    APPLY_SMOOTHERSTEP_FILTER();
 
    store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
 
// Implementation of tex::lookup_float4() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float4_2d(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    tct_deriv_float2 const     *coord,
    Tex_wrap_mode const         wrap_u,
    Tex_wrap_mode const         wrap_v,
    float const                 crop_u[2],
    float const                 crop_v[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result4(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
    float u = coord->val.x, v = coord->val.y;
    WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
    WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
 
    APPLY_SMOOTHERSTEP_FILTER();
 
    store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
 
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_float3_2d(
    float                       result[3],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float const                 coord[2],
    Tex_wrap_mode const         wrap_u,
    Tex_wrap_mode const         wrap_v,
    float const                 crop_u[2],
    float const                 crop_v[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result3(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
    float u = coord[0], v = coord[1];
    WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
    WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
 
    APPLY_SMOOTHERSTEP_FILTER();
 
    store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
 
// Implementation of tex::lookup_float3() for a texture_2d texture.
extern "C" __device__ void tex_lookup_deriv_float3_2d(
    float                       result[3],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    tct_deriv_float2 const     *coord,
    Tex_wrap_mode const         wrap_u,
    Tex_wrap_mode const         wrap_v,
    float const                 crop_u[2],
    float const                 crop_v[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result3(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
    float u = coord->val.x, v = coord->val.y;
    WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
    WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
 
    APPLY_SMOOTHERSTEP_FILTER();
 
    store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
 
// Implementation of tex::texel_float4() for a texture_2d texture.
// Note: uvtile and/or animated textures are not supported
extern "C" __device__ void tex_texel_float4_2d(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    int const                   coord[2],
    int const                   /*uv_tile*/[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result4(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
 
    store_result4(result, tex2D<float4>(
        tex.unfiltered_object,
        float(coord[0]) * tex.inv_size.x,
        float(coord[1]) * tex.inv_size.y));
}
 
// Implementation of tex::lookup_float4() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float4_3d(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float const                 coord[3],
    Tex_wrap_mode               wrap_u,
    Tex_wrap_mode               wrap_v,
    Tex_wrap_mode               wrap_w,
    float const                 crop_u[2],
    float const                 crop_v[2],
    float const                 crop_w[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result4(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
 
    float u = coord[0], v = coord[1], w = coord[2];
    WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
    WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
    WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
 
    store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
 
// Implementation of tex::lookup_float3() for a texture_3d texture.
extern "C" __device__ void tex_lookup_float3_3d(
    float                       result[3],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float const                 coord[3],
    Tex_wrap_mode               wrap_u,
    Tex_wrap_mode               wrap_v,
    Tex_wrap_mode               wrap_w,
    float const                 crop_u[2],
    float const                 crop_v[2],
    float const                 crop_w[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result3(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
 
    float u = coord[0], v = coord[1], w = coord[2];
    WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
    WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
    WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
 
    store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
 
// Implementation of tex::texel_float4() for a texture_3d texture.
extern "C" __device__ void tex_texel_float4_3d(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    const int                   coord[3],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result4(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
 
    store_result4(result, tex3D<float4>(
        tex.unfiltered_object,
        float(coord[0]) * tex.inv_size.x,
        float(coord[1]) * tex.inv_size.y,
        float(coord[2]) * tex.inv_size.z));
}
 
// Implementation of tex::lookup_float4() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float4_cube(
    float                       result[4],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float const                 coord[3])
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result4(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
 
    store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
 
// Implementation of tex::lookup_float3() for a texture_cube texture.
extern "C" __device__ void tex_lookup_float3_cube(
    float                       result[3],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float const                 coord[3])
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        store_result3(result, 0.0f);
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
 
    store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
 
// Implementation of resolution_2d function needed by generated code.
// Note: uvtile and/or animated textures are not supported
extern "C" __device__ void tex_resolution_2d(
    int                         result[2],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    int const                   /*uv_tile*/[2],
    float                       /*frame*/)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
        // invalid texture returns zero
        result[0] = 0;
        result[1] = 0;
        return;
    }
 
    Texture const &tex = self->textures[texture_idx - 1];
    result[0] = tex.size.x;
    result[1] = tex.size.y;
}
 
// Implementation of resolution_3d function needed by generated code.
extern "C" __device__ void tex_resolution_3d(
    int                         result[3],
    Texture_handler_base const *self_base,
    unsigned                    texture_idx,
    float                       /*frame*/)
{
    Texture_handler const* self = static_cast<Texture_handler const*>(self_base);
 
    if (texture_idx == 0 || texture_idx - 1 >= self->num_textures) {
        // invalid texture returns zero
        result[0] = 0;
        result[1] = 0;
        result[2] = 0;
        return;
    }
 
    Texture const& tex = self->textures[texture_idx - 1];
    result[0] = tex.size.x;
    result[1] = tex.size.y;
    result[2] = tex.size.z;
}
 
// Implementation of texture_isvalid().
extern "C" __device__ bool tex_texture_isvalid(
    Texture_handler_base const *self_base,
    unsigned                    texture_idx)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    return texture_idx != 0 && texture_idx - 1 < self->num_textures;
}
 
// Implementation of frame function needed by generated code.
// Note: frames are not implemented for the MDL SDK examples.
extern "C" __device__ void tex_frame(
    int                         result[2],
    Texture_handler_base const */*self_base*/,
    unsigned                    /*texture_idx*/)
{
    result[0] = 0;
    result[1] = 0;
}
 
 
// ------------------------------------------------------------------------------------------------
// Light Profiles
// ------------------------------------------------------------------------------------------------
 
 
// Implementation of light_profile_power() for a light profile.
extern "C" __device__ float df_light_profile_power(
    Texture_handler_base const *self_base,
    unsigned                    light_profile_idx)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
        return 0.0f; // invalid light profile returns zero
 
    const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
    return lp.total_power;
}
 
// Implementation of light_profile_maximum() for a light profile.
extern "C" __device__ float df_light_profile_maximum(
    Texture_handler_base const *self_base,
    unsigned                    light_profile_idx)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
        return 0.0f; // invalid light profile returns zero
 
    const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
    return lp.candela_multiplier;
}
 
// Implementation of light_profile_isvalid() for a light profile.
extern "C" __device__ bool df_light_profile_isvalid(
    Texture_handler_base const *self_base,
    unsigned                    light_profile_idx)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    return light_profile_idx != 0 && light_profile_idx - 1 < self->num_lightprofiles;
}
 
// binary search through CDF
__device__ inline unsigned sample_cdf(
    const float* cdf,
    unsigned cdf_size,
    float xi)
{
    unsigned li = 0;
    unsigned ri = cdf_size - 1;
    unsigned m = (li + ri) / 2;
    while (ri > li)
    {
        if (xi < cdf[m])
            ri = m;
        else
            li = m + 1;
 
        m = (li + ri) / 2;
    }
 
    return m;
}
 
 
// Implementation of df::light_profile_evaluate() for a light profile.
extern "C" __device__ float df_light_profile_evaluate(
    Texture_handler_base const  *self_base,
    unsigned                    light_profile_idx,
    float const                 theta_phi[2])
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
        return 0.0f;  // invalid light profile returns zero
 
    const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
 
    // map theta to 0..1 range
    float u = (theta_phi[0] - lp.theta_phi_start.x) *
        lp.theta_phi_inv_delta.x * lp.inv_angular_resolution.x;
 
    // converting input phi from -pi..pi to 0..2pi
    float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
 
    // floorf wraps phi range into 0..2pi
    phi = phi - lp.theta_phi_start.y -
        floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
 
    // (phi < 0.0f) is no problem, this is handled by the (black) border
    // since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
    float v = phi * lp.theta_phi_inv_delta.y * lp.inv_angular_resolution.y;
 
    // half pixel offset
    // see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#linear-filtering
    u += 0.5f * lp.inv_angular_resolution.x;
    v += 0.5f * lp.inv_angular_resolution.y;
 
    // wrap_mode: border black would be an alternative (but it produces artifacts at low res)
    if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
 
    return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
 
// Implementation of df::light_profile_sample() for a light profile.
extern "C" __device__ void df_light_profile_sample(
    float                       result[3],          // output: theta, phi, pdf
    Texture_handler_base const  *self_base,
    unsigned                    light_profile_idx,
    float const                 xi[3])              // uniform random values
{
    result[0] = -1.0f;  // negative theta means no emission
    result[1] = -1.0f;
    result[2] = 0.0f;
 
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
        return;  // invalid light profile returns zero
 
    const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
    uint2 res = lp.angular_resolution;
    if (res.x <= 2 || res.y <= 2)
        return;  // invalid resolution
 
 
    // sample theta_out
    //-------------------------------------------
    float xi0 = xi[0];
    const float* cdf_data_theta = lp.cdf_data;                          // CDF theta
    unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0);    // binary search
 
    float prob_theta = cdf_data_theta[idx_theta];
    if (idx_theta > 0)
    {
        const float tmp = cdf_data_theta[idx_theta - 1];
        prob_theta -= tmp;
        xi0 -= tmp;
    }
    xi0 /= prob_theta;  // rescale for re-usage
 
    // sample phi_out
    //-------------------------------------------
    float xi1 = xi[1];
    const float* cdf_data_phi = cdf_data_theta + (res.x - 1)            // CDF theta block
                              + (idx_theta * (res.y - 1));              // selected CDF for phi
    const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1);  // binary search
 
    float prob_phi = cdf_data_phi[idx_phi];
    if (idx_phi > 0)
    {
        const float tmp = cdf_data_phi[idx_phi - 1];
        prob_phi -= tmp;
        xi1 -= tmp;
    }
    xi1 /= prob_phi;  // rescale for re-usage
 
    // compute theta and phi
    //-------------------------------------------
    // sample uniformly within the patch (grid cell)
    const float2 start = lp.theta_phi_start;
    const float2 delta = lp.theta_phi_delta;
 
    const float cos_theta_0 = cosf(start.x + float(idx_theta)      * delta.x);
    const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
 
    //               n = \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
    //                 = 1 / (\cos{\theta_0} - \cos{\theta_1})
    //
    //             \xi = n * \int_{\theta_0}^{\theta_1} \sin{\theta} \delta \theta
    // => \cos{\theta} = (1 - \xi) \cos{\theta_0} + \xi \cos{\theta_1}
 
    const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
    result[0] = acosf(cos_theta);
    result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
 
    // align phi
    if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);              // wrap
    if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];  // to [-pi, pi]
 
    // compute pdf
    //-------------------------------------------
    result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
 
 
// Implementation of df::light_profile_pdf() for a light profile.
extern "C" __device__ float df_light_profile_pdf(
    Texture_handler_base const  *self_base,
    unsigned                    light_profile_idx,
    float const                 theta_phi[2])
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
        return 0.0f;  // invalid light profile returns zero
 
    const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
    const uint2 res = lp.angular_resolution;
    const float* cdf_data_theta = lp.cdf_data;
 
    // map theta to 0..1 range
    const float theta = theta_phi[0] - lp.theta_phi_start.x;
    const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
 
    // converting input phi from -pi..pi to 0..2pi
    float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
 
    // floorf wraps phi range into 0..2pi
    phi = phi - lp.theta_phi_start.y -
        floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
 
    // (phi < 0.0f) is no problem, this is handled by the (black) border
    // since it implies lp.theta_phi_start.y > 0 (and we really have "no data" below that)
    const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
 
    // wrap_mode: border black would be an alternative (but it produces artifacts at low res)
    if (idx_theta < 0 || idx_theta > res.x - 2 || idx_phi < 0 || idx_phi > res.y - 2)
        return 0.0f;
 
 
    // get probability for theta
    //-------------------------------------------
    float prob_theta = cdf_data_theta[idx_theta];
    if (idx_theta > 0)
    {
        const float tmp = cdf_data_theta[idx_theta - 1];
        prob_theta -= tmp;
    }
 
    // get probability for phi
    //-------------------------------------------
    const float* cdf_data_phi = cdf_data_theta
        + (res.x - 1)                             // CDF theta block
        + (idx_theta * (res.y - 1));              // selected CDF for phi
 
    float prob_phi = cdf_data_phi[idx_phi];
    if (idx_phi > 0)
    {
        const float tmp = cdf_data_phi[idx_phi - 1];
        prob_phi -= tmp;
    }
 
    // compute probability to select a position in the sphere patch
    const float2 start = lp.theta_phi_start;
    const float2 delta = lp.theta_phi_delta;
 
    const float cos_theta_0 = cosf(start.x + float(idx_theta)      * delta.x);
    const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
 
    return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
 
 
// ------------------------------------------------------------------------------------------------
// BSDF Measurements
// ------------------------------------------------------------------------------------------------
 
// Implementation of bsdf_measurement_isvalid() for an MBSDF.
extern "C" __device__ bool df_bsdf_measurement_isvalid(
    Texture_handler_base const *self_base,
    unsigned                    bsdf_measurement_idx)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    return bsdf_measurement_idx != 0 && bsdf_measurement_idx - 1 < self->num_mbsdfs;
}
 
// Implementation of df::bsdf_measurement_resolution() function needed by generated code,
// which retrieves the angular and chromatic resolution of the given MBSDF.
// The returned triple consists of: number of equi-spaced steps of theta_i and theta_o,
// number of equi-spaced steps of phi, and number of color channels (1 or 3).
extern "C" __device__ void df_bsdf_measurement_resolution(
    unsigned                    result[3],
    Texture_handler_base const  *self_base,
    unsigned                    bsdf_measurement_idx,
    Mbsdf_part                  part)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
    {
        // invalid MBSDF returns zero
        result[0] = 0;
        result[1] = 0;
        result[2] = 0;
        return;
    }
 
    Mbsdf const &bm = self->mbsdfs[bsdf_measurement_idx - 1];
    const unsigned part_idx = static_cast<unsigned>(part);
 
    // check for the part
    if (part_idx > 1 || bm.has_data[part_idx] == 0)
    {
        result[0] = 0;
        result[1] = 0;
        result[2] = 0;
        return;
    }
 
    // pass out the information
    result[0] = bm.angular_resolution[part_idx].x;
    result[1] = bm.angular_resolution[part_idx].y;
    result[2] = bm.num_channels[part_idx];
}
 
 
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
                                          const float theta_phi_out[2])
{
    // assuming each phi is between -pi and pi
    float u = theta_phi_out[1] - theta_phi_in[1];
    if (u < 0.0) u += float(2.0 * M_PI);
    if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
    u *= float(M_ONE_OVER_PI);
 
    const float v = theta_phi_out[0] * float(2.0 / M_PI);
    const float w = theta_phi_in[0] * float(2.0 / M_PI);
 
    return make_float3(u, v, w);
}
 
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
                                            const float theta_phi_in[2],
                                            const float theta_phi_out[2])
{
    // 3D volume on the GPU (phi_delta x theta_out x theta_in)
    const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
    return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
 
// Implementation of df::bsdf_measurement_evaluate() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_evaluate(
    float                       result[3],
    Texture_handler_base const  *self_base,
    unsigned                    bsdf_measurement_idx,
    float const                 theta_phi_in[2],
    float const                 theta_phi_out[2],
    Mbsdf_part                  part)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
    {
        // invalid MBSDF returns zero
        store_result3(result, 0.0f);
        return;
    }
 
    const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
    const unsigned part_idx = static_cast<unsigned>(part);
 
    // check for the parts
    if (part_idx > 1 || bm.has_data[part_idx] == 0)
    {
        store_result3(result, 0.0f);
        return;
    }
 
    // handle channels
    if (bm.num_channels[part_idx] == 3)
    {
        const float4 sample = bsdf_measurement_lookup<float4>(
            bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
        store_result3(result, sample.x, sample.y, sample.z);
    }
    else
    {
        const float sample = bsdf_measurement_lookup<float>(
            bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
        store_result3(result, sample);
    }
}
 
// Implementation of df::bsdf_measurement_sample() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_sample(
    float                       result[3],          // output: theta, phi, pdf
    Texture_handler_base const  *self_base,
    unsigned                    bsdf_measurement_idx,
    float const                 theta_phi_out[2],
    float const                 xi[3],              // uniform random values
    Mbsdf_part                  part)
{
    result[0] = -1.0f;  // negative theta means absorption
    result[1] = -1.0f;
    result[2] = 0.0f;
 
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
        return;  // invalid MBSDFs returns zero
 
    const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
    unsigned part_idx = static_cast<unsigned>(part);
 
    if (part_idx > 1 || bm.has_data[part_idx] == 0)
        return;  // check for the part
 
    // CDF data
    uint2 res = bm.angular_resolution[part_idx];
    const float* sample_data = bm.sample_data[part_idx];
    if (res.x < 1 || res.y < 1)
        return;  // invalid resolution
 
    unsigned idx_theta_out = unsigned(theta_phi_out[0] * float(M_ONE_OVER_PI * 2.0f) * float(res.x));
    idx_theta_out = min(idx_theta_out, res.x - 1);
 
    // sample theta_in
    //-------------------------------------------
    float xi0 = xi[0];
    const float* cdf_theta = sample_data + idx_theta_out * res.x;
    unsigned idx_theta_in = sample_cdf(cdf_theta, res.x, xi0);       // binary search
 
    float prob_theta = cdf_theta[idx_theta_in];
    if (idx_theta_in > 0)
    {
        const float tmp = cdf_theta[idx_theta_in - 1];
        prob_theta -= tmp;
        xi0 -= tmp;
    }
    xi0 /= prob_theta;  // rescale for re-usage
 
    // sample phi
    //-------------------------------------------
    float xi1 = xi[1];
    const float* cdf_phi = sample_data +
                           (res.x * res.x) +                                // CDF theta block
                           (idx_theta_out * res.x + idx_theta_in) * res.y;  // selected CDF phi
 
    // select which half-circle to choose with probability 0.5
    const bool flip = (xi1 > 0.5f);
    if (flip)
        xi1 = 1.0f - xi1;
    xi1 *= 2.0f;
 
    unsigned idx_phi = sample_cdf(cdf_phi, res.y, xi1);           // binary search
    float prob_phi = cdf_phi[idx_phi];
    if (idx_phi > 0)
    {
        const float tmp = cdf_phi[idx_phi - 1];
        prob_phi -= tmp;
        xi1 -= tmp;
    }
    xi1 /= prob_phi;  // rescale for re-usage
 
    // compute direction
    //-------------------------------------------
    const float2 inv_res = bm.inv_angular_resolution[part_idx];
 
    const float s_theta = float(0.5 * M_PI) * inv_res.x;
    const float s_phi   = float(1.0 * M_PI) * inv_res.y;
 
    const float cos_theta_0 = cosf(float(idx_theta_in)      * s_theta);
    const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
 
    const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
    result[0] = acosf(cos_theta);
    result[1] = (float(idx_phi) + xi0) * s_phi;
 
    if (flip)
        result[1] = float(2.0 * M_PI) - result[1];  // phi \in [0, 2pi]
 
    // align phi
    result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
    if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
    if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];  // to [-pi, pi]
 
    // compute pdf
    //-------------------------------------------
    result[2] = prob_theta * prob_phi * 0.5f
                / (s_phi * (cos_theta_0 - cos_theta_1));
}
 
// Implementation of df::bsdf_measurement_pdf() for an MBSDF.
extern "C" __device__ float df_bsdf_measurement_pdf(
    Texture_handler_base const  *self_base,
    unsigned                    bsdf_measurement_idx,
    float const                 theta_phi_in[2],
    float const                 theta_phi_out[2],
    Mbsdf_part                  part)
{
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
 
    if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
        return 0.0f;  // invalid MBSDF returns zero
 
    const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
    unsigned part_idx = static_cast<unsigned>(part);
 
    // check for the part
    if (part_idx > 1 || bm.has_data[part_idx] == 0)
        return 0.0f;
 
    // CDF data and resolution
    const float* sample_data = bm.sample_data[part_idx];
    uint2 res = bm.angular_resolution[part_idx];
 
    // compute indices in the CDF data
    float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out); // phi_delta, theta_out, theta_in
    unsigned idx_theta_in = unsigned(uvw.z * float(res.x));
    unsigned idx_theta_out  = unsigned(uvw.y * float(res.x));
    unsigned idx_phi   = unsigned(uvw.x * float(res.y));
    idx_theta_in  = min(idx_theta_in, res.x - 1);
    idx_theta_out = min(idx_theta_out, res.x - 1);
    idx_phi   = min(idx_phi, res.y - 1);
 
    // get probability to select theta_in
    const float* cdf_theta = sample_data + idx_theta_out * res.x;
    float prob_theta = cdf_theta[idx_theta_in];
    if (idx_theta_in > 0)
    {
        const float tmp = cdf_theta[idx_theta_in - 1];
        prob_theta -= tmp;
    }
 
    // get probability to select phi
    const float* cdf_phi = sample_data +
        (res.x * res.x) +                                // CDF theta block
        (idx_theta_out * res.x + idx_theta_in) * res.y;  // selected CDF phi
    float prob_phi = cdf_phi[idx_phi];
    if (idx_phi > 0)
    {
        const float tmp = cdf_phi[idx_phi - 1];
        prob_phi -= tmp;
    }
 
    // compute probability to select a position in the sphere patch
    float2 inv_res = bm.inv_angular_resolution[part_idx];
 
    const float s_theta = float(0.5 * M_PI) * inv_res.x;
    const float s_phi = float(1.0 * M_PI) * inv_res.y;
 
    const float cos_theta_0 = cosf(float(idx_theta_in)      * s_theta);
    const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
 
    return prob_theta * prob_phi * 0.5f
        / (s_phi * (cos_theta_0 - cos_theta_1));
}
 
 
__device__ inline void df_bsdf_measurement_albedo(
    float                       result[2],          // output: max (in case of color) albedo
                                                    // for the selected direction ([0]) and
                                                    // global ([1])
    Texture_handler const       *self,
    unsigned                    bsdf_measurement_idx,
    float const                 theta_phi[2],
    Mbsdf_part                  part)
{
    const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
    const unsigned part_idx = static_cast<unsigned>(part);
 
    // check for the part
    if (part_idx > 1 || bm.has_data[part_idx] == 0)
        return;
 
    const uint2 res = bm.angular_resolution[part_idx];
    if (res.x < 1)
        return;
 
    unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
    idx_theta = min(idx_theta, res.x - 1u);
    result[0] = bm.albedo_data[part_idx][idx_theta];
    result[1] = bm.max_albedo[part_idx];
}
 
// Implementation of df::bsdf_measurement_albedos() for an MBSDF.
extern "C" __device__ void df_bsdf_measurement_albedos(
    float                       result[4],          // output: [0] albedo refl. for theta_phi
                                                    //         [1] max albedo refl. global
                                                    //         [2] albedo trans. for theta_phi
                                                    //         [3] max albedo trans. global
    Texture_handler_base const  *self_base,
    unsigned                    bsdf_measurement_idx,
    float const                 theta_phi[2])
{
    result[0] = 0.0f;
    result[1] = 0.0f;
    result[2] = 0.0f;
    result[3] = 0.0f;
 
    Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
    if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
        return;  // invalid MBSDF returns zero
 
    df_bsdf_measurement_albedo(
        &result[0],
        self,
        bsdf_measurement_idx,
        theta_phi,
        mi::neuraylib::MBSDF_DATA_REFLECTION);
 
    df_bsdf_measurement_albedo(
        &result[2],
        self,
        bsdf_measurement_idx,
        theta_phi,
        mi::neuraylib::MBSDF_DATA_TRANSMISSION);
}
 
 
// ------------------------------------------------------------------------------------------------
// Normal adaption (dummy functions)
//
// Can be enabled via backend option "use_renderer_adapt_normal".
// ------------------------------------------------------------------------------------------------
 
#ifndef TEX_SUPPORT_NO_DUMMY_ADAPTNORMAL
 
// Implementation of adapt_normal().
extern "C" __device__ void adapt_normal(
    float                                  result[3],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    float const                            normal[3])
{
    // just return original normal
    result[0] = normal[0];
    result[1] = normal[1];
    result[2] = normal[2];
}
 
#endif  // TEX_SUPPORT_NO_DUMMY_ADAPTNORMAL
 
 
// ------------------------------------------------------------------------------------------------
// Scene data (dummy functions)
// ------------------------------------------------------------------------------------------------
 
#ifndef TEX_SUPPORT_NO_DUMMY_SCENEDATA
 
// Implementation of scene_data_isvalid().
extern "C" __device__ bool scene_data_isvalid(
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id)
{
    return false;
}
 
// Implementation of scene_data_lookup_float4().
extern "C" __device__ void scene_data_lookup_float4(
    float                                  result[4],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    float const                            default_value[4],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
    result[2] = default_value[2];
    result[3] = default_value[3];
}
 
// Implementation of scene_data_lookup_float3().
extern "C" __device__ void scene_data_lookup_float3(
    float                                  result[3],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    float const                            default_value[3],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
    result[2] = default_value[2];
}
 
// Implementation of scene_data_lookup_color().
extern "C" __device__ void scene_data_lookup_color(
    float                                  result[3],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    float const                            default_value[3],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
    result[2] = default_value[2];
}
 
// Implementation of scene_data_lookup_float2().
extern "C" __device__ void scene_data_lookup_float2(
    float                                  result[2],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    float const                            default_value[2],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
}
 
// Implementation of scene_data_lookup_float().
extern "C" __device__ float scene_data_lookup_float(
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    float const                            default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    return default_value;
}
 
// Implementation of scene_data_lookup_int4().
extern "C" __device__ void scene_data_lookup_int4(
    int                                    result[4],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    int const                              default_value[4],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
    result[2] = default_value[2];
    result[3] = default_value[3];
}
 
// Implementation of scene_data_lookup_int3().
extern "C" __device__ void scene_data_lookup_int3(
    int                                    result[3],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    int const                              default_value[3],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
    result[2] = default_value[2];
}
 
// Implementation of scene_data_lookup_int2().
extern "C" __device__ void scene_data_lookup_int2(
    int                                    result[2],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    int const                              default_value[2],
    bool                                   uniform_lookup)
{
    // just return default value
    result[0] = default_value[0];
    result[1] = default_value[1];
}
 
// Implementation of scene_data_lookup_int().
extern "C" __device__ int scene_data_lookup_int(
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    int                                    default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    return default_value;
}
 
// Implementation of scene_data_lookup_float4x4().
extern "C" __device__ void scene_data_lookup_float4x4(
    float                                  result[16],
    Texture_handler_base const            *self_base,
    Shading_state_material                *state,
    unsigned                               scene_data_id,
    float const                            default_value[16],
    bool                                   uniform_lookup)
{
    // just return default value
    for (int i = 0; i < 16; ++i)
        result[i] = default_value[i];
}
 
// Implementation of scene_data_lookup_float4() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float4(
    tct_deriv_arr_float_4                 *result,
    Texture_handler_base const            *self_base,
    Shading_state_material_with_derivs    *state,
    unsigned                               scene_data_id,
    tct_deriv_arr_float_4 const           *default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    *result = *default_value;
}
 
// Implementation of scene_data_lookup_float3() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float3(
    tct_deriv_arr_float_3                 *result,
    Texture_handler_base const            *self_base,
    Shading_state_material_with_derivs    *state,
    unsigned                               scene_data_id,
    tct_deriv_arr_float_3 const           *default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    *result = *default_value;
}
 
// Implementation of scene_data_lookup_color() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_color(
    tct_deriv_arr_float_3                 *result,
    Texture_handler_base const            *self_base,
    Shading_state_material_with_derivs    *state,
    unsigned                               scene_data_id,
    tct_deriv_arr_float_3 const           *default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    *result = *default_value;
}
 
// Implementation of scene_data_lookup_float2() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float2(
    tct_deriv_arr_float_2                 *result,
    Texture_handler_base const            *self_base,
    Shading_state_material_with_derivs    *state,
    unsigned                               scene_data_id,
    tct_deriv_arr_float_2 const           *default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    *result = *default_value;
}
 
// Implementation of scene_data_lookup_float() with derivatives.
extern "C" __device__ void scene_data_lookup_deriv_float(
    tct_deriv_float                       *result,
    Texture_handler_base const            *self_base,
    Shading_state_material_with_derivs    *state,
    unsigned                               scene_data_id,
    tct_deriv_float const                 *default_value,
    bool                                   uniform_lookup)
{
    // just return default value
    *result = *default_value;
}
 
#endif  // TEX_SUPPORT_NO_DUMMY_SCENEDATA
 
 
// ------------------------------------------------------------------------------------------------
// Vtables
// ------------------------------------------------------------------------------------------------
 
#ifndef TEX_SUPPORT_NO_VTABLES
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode.
__device__ mi::neuraylib::Texture_handler_vtable tex_vtable = {
    tex_lookup_float4_2d,
    tex_lookup_float3_2d,
    tex_texel_float4_2d,
    tex_lookup_float4_3d,
    tex_lookup_float3_3d,
    tex_texel_float4_3d,
    tex_lookup_float4_cube,
    tex_lookup_float3_cube,
    tex_resolution_2d,
    tex_resolution_3d,
    tex_texture_isvalid,
    tex_frame,
    df_light_profile_power,
    df_light_profile_maximum,
    df_light_profile_isvalid,
    df_light_profile_evaluate,
    df_light_profile_sample,
    df_light_profile_pdf,
    df_bsdf_measurement_isvalid,
    df_bsdf_measurement_resolution,
    df_bsdf_measurement_evaluate,
    df_bsdf_measurement_sample,
    df_bsdf_measurement_pdf,
    df_bsdf_measurement_albedos,
    adapt_normal,
    scene_data_isvalid,
    scene_data_lookup_float,
    scene_data_lookup_float2,
    scene_data_lookup_float3,
    scene_data_lookup_float4,
    scene_data_lookup_int,
    scene_data_lookup_int2,
    scene_data_lookup_int3,
    scene_data_lookup_int4,
    scene_data_lookup_color,
    scene_data_lookup_float4x4,
};
 
// The vtable containing all texture access handlers required by the generated code
// in "vtable" mode with derivatives.
__device__ mi::neuraylib::Texture_handler_deriv_vtable tex_deriv_vtable = {
    tex_lookup_deriv_float4_2d,
    tex_lookup_deriv_float3_2d,
    tex_texel_float4_2d,
    tex_lookup_float4_3d,
    tex_lookup_float3_3d,
    tex_texel_float4_3d,
    tex_lookup_float4_cube,
    tex_lookup_float3_cube,
    tex_resolution_2d,
    tex_resolution_3d,
    tex_texture_isvalid,
    tex_frame,
    df_light_profile_power,
    df_light_profile_maximum,
    df_light_profile_isvalid,
    df_light_profile_evaluate,
    df_light_profile_sample,
    df_light_profile_pdf,
    df_bsdf_measurement_isvalid,
    df_bsdf_measurement_resolution,
    df_bsdf_measurement_evaluate,
    df_bsdf_measurement_sample,
    df_bsdf_measurement_pdf,
    df_bsdf_measurement_albedos,
    adapt_normal,
    scene_data_isvalid,
    scene_data_lookup_float,
    scene_data_lookup_float2,
    scene_data_lookup_float3,
    scene_data_lookup_float4,
    scene_data_lookup_int,
    scene_data_lookup_int2,
    scene_data_lookup_int3,
    scene_data_lookup_int4,
    scene_data_lookup_color,
    scene_data_lookup_float4x4,
    scene_data_lookup_deriv_float,
    scene_data_lookup_deriv_float2,
    scene_data_lookup_deriv_float3,
    scene_data_lookup_deriv_float4,
    scene_data_lookup_deriv_color,
};
#endif  // TEX_SUPPORT_NO_VTABLES
 
#endif  // __CUDACC__
 
#endif  // TEXTURE_SUPPORT_CUDA_H

Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h

/******************************************************************************
 * Copyright 2024 NVIDIA Corporation. All rights reserved.
 *****************************************************************************/
 
// Code shared by CUDA MDL SDK examples
 
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
 
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
 
#define _USE_MATH_DEFINES
#include <math.h>
 
#include "example_shared.h"
#include "compiled_material_traverser_base.h"
 
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
 
#include "utils/profiling.h"
 
// Structure representing an MDL texture, containing filtered and unfiltered CUDA texture
// objects and the size of the texture.
struct Texture
{
    explicit Texture(cudaTextureObject_t  filtered_object,
            cudaTextureObject_t  unfiltered_object,
            uint3                size)
        : filtered_object(filtered_object)
        , unfiltered_object(unfiltered_object)
        , size(size)
        , inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
    {}
 
    cudaTextureObject_t  filtered_object;    // uses filter mode cudaFilterModeLinear
    cudaTextureObject_t  unfiltered_object;  // uses filter mode cudaFilterModePoint
    uint3                size;               // size of the texture, needed for texel access
    float3               inv_size;           // the inverse values of the size of the texture
};
 
// Structure representing an MDL bsdf measurement.
struct Mbsdf
{
    explicit Mbsdf()
    {
        for (unsigned i = 0; i < 2; ++i) {
            has_data[i] = 0u;
            eval_data[i] = 0;
            sample_data[i] = 0;
            albedo_data[i] = 0;
            this->max_albedo[i] = 0.0f;
            angular_resolution[i] = make_uint2(0u, 0u);
            inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
            num_channels[i] = 0;
        }
    }
 
    void Add(mi::neuraylib::Mbsdf_part part,
             const uint2& angular_resolution,
             unsigned num_channels)
    {
        unsigned part_idx = static_cast<unsigned>(part);
 
        this->has_data[part_idx] = 1u;
        this->angular_resolution[part_idx] = angular_resolution;
        this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
                                                             1.0f / float(angular_resolution.y));
        this->num_channels[part_idx] = num_channels;
    }
 
    unsigned            has_data[2];            // true if there is a measurement for this part
    cudaTextureObject_t eval_data[2];           // uses filter mode cudaFilterModeLinear
    float               max_albedo[2];          // max albedo used to limit the multiplier
    float*              sample_data[2];         // CDFs for sampling a BSDF measurement
    float*              albedo_data[2];         // max albedo for each theta (isotropic)
 
    uint2           angular_resolution[2];      // size of the dataset, needed for texel access
    float2          inv_angular_resolution[2];  // the inverse values of the size of the dataset
    unsigned        num_channels[2];            // number of color channels (1 or 3)
};
 
 
// Structure representing a Light Profile
struct Lightprofile
{
    explicit Lightprofile(
        uint2               angular_resolution = make_uint2(0, 0),
        float2              theta_phi_start = make_float2(0.0f, 0.0f),
        float2              theta_phi_delta = make_float2(0.0f, 0.0f),
        float               candela_multiplier = 0.0f,
        float               total_power = 0.0f,
        cudaTextureObject_t eval_data = 0,
        float               *cdf_data = nullptr)
    : angular_resolution(angular_resolution)
    , inv_angular_resolution(make_float2(
        1.0f / float(angular_resolution.x),
        1.0f / float(angular_resolution.y)))
    , theta_phi_start(theta_phi_start)
    , theta_phi_delta(theta_phi_delta)
        , theta_phi_inv_delta(make_float2(0.0f, 0.0f))
    , candela_multiplier(candela_multiplier)
    , total_power(total_power)
    , eval_data(eval_data)
    , cdf_data(cdf_data)
    {
        theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
        theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
    }
 
    uint2           angular_resolution;     // angular resolution of the grid
    float2          inv_angular_resolution; // inverse angular resolution of the grid
    float2          theta_phi_start;        // start of the grid
    float2          theta_phi_delta;        // angular step size
    float2          theta_phi_inv_delta;    // inverse step size
    float           candela_multiplier;     // factor to rescale the normalized data
    float           total_power;
 
    cudaTextureObject_t eval_data;          // normalized data sampled on grid
    float*              cdf_data;           // CDFs for sampling a light profile
};
 
// Structure representing the resources used by the generated code of a target code.
struct Target_code_data
{
    Target_code_data(
        size_t num_textures,
        CUdeviceptr textures,
        size_t num_mbsdfs,
        CUdeviceptr mbsdfs,
        size_t num_lightprofiles,
        CUdeviceptr lightprofiles,
                     CUdeviceptr ro_data_segment)
        : num_textures(num_textures)
        , textures(textures)
        , num_mbsdfs(num_mbsdfs)
        , mbsdfs(mbsdfs)
        , num_lightprofiles(num_lightprofiles)
        , lightprofiles(lightprofiles)
        , ro_data_segment(ro_data_segment)
    {}
 
    size_t      num_textures;           // number of elements in the textures field
    CUdeviceptr textures;               // a device pointer to a list of Texture objects, if used
 
    size_t      num_mbsdfs;             // number of elements in the mbsdfs field
    CUdeviceptr mbsdfs;                 // a device pointer to a list of mbsdfs objects, if used
 
    size_t      num_lightprofiles;     // number of elements in the lightprofiles field
    CUdeviceptr lightprofiles;         // a device pointer to a list of mbsdfs objects, if used
 
    CUdeviceptr ro_data_segment;        // a device pointer to the read-only data segment, if used
};
 
 
//------------------------------------------------------------------------------
//
// Helper functions
//
//------------------------------------------------------------------------------
 
// Return a textual representation of the given value.
template <typename T>
std::string to_string(T val)
{
    std::ostringstream stream;
    stream << val;
    return stream.str();
}
 
// Collects the handles in a compiled material
class Handle_collector : public Compiled_material_traverser_base
{
public:
    // add all handle appearing in the provided material to the collectors handle list.
    explicit Handle_collector(
        mi::neuraylib::ITransaction* transaction,
        const mi::neuraylib::ICompiled_material* material)
    : Compiled_material_traverser_base()
    {
        traverse(material, transaction);
    }
 
    // get the collected handles.
    const std::vector<std::string>& get_handles() const { return m_handles; }
 
private:
    // Called when the traversal reaches a new element.
    void visit_begin(const mi::neuraylib::ICompiled_material* material,
                     const Compiled_material_traverser_base::Traversal_element& element,
                     void* context) override
    {
        // look for direct calls
        if (!element.expression ||
            element.expression->get_kind() != mi::neuraylib::IExpression::EK_DIRECT_CALL)
            return;
 
        // check if it is a distribution function
        auto transaction = static_cast<mi::neuraylib::ITransaction*>(context);
 
        const mi::base::Handle<const mi::neuraylib::IExpression_direct_call> expr_dcall(
            element.expression->get_interface<const mi::neuraylib::IExpression_direct_call
            >());
        const mi::base::Handle<const mi::neuraylib::IExpression_list> args(
            expr_dcall->get_arguments());
        const mi::base::Handle<const mi::neuraylib::IFunction_definition> func_def(
            transaction->access<mi::neuraylib::IFunction_definition>(
            expr_dcall->get_definition()));
        const mi::neuraylib::IFunction_definition::Semantics semantic = func_def->
            get_semantic();
 
        if (semantic < mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_FIRST
            || semantic > mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_LAST)
            return;
 
        // check if the last argument is a handle
        const mi::base::Handle<const mi::neuraylib::IExpression_list> arguments(
            expr_dcall->get_arguments());
 
        mi::Size arg_count = arguments->get_size();
        const char* name = arguments->get_name(arg_count - 1);
        if (strcmp(name, "handle") != 0)
            return;
 
        // get the handle value
        mi::base::Handle<const mi::neuraylib::IExpression> expr(
            arguments->get_expression(arg_count - 1));
 
        if (expr->get_kind() != mi::neuraylib::IExpression::EK_CONSTANT)
            return; // is an error if 'handle' is a reserved parameter name
 
        const mi::base::Handle<const mi::neuraylib::IExpression_constant> expr_const(
            expr->get_interface<const mi::neuraylib::IExpression_constant>());
 
        const mi::base::Handle<const mi::neuraylib::IValue> value(expr_const->get_value());
        if (value->get_kind() != mi::neuraylib::IValue::VK_STRING)
            return;
 
        const mi::base::Handle<const mi::neuraylib::IValue_string> handle(
            value->get_interface<const mi::neuraylib::IValue_string>());
 
        std::string handle_value = handle->get_value() ? std::string(handle->get_value()) : "";
 
        if (std::find(m_handles.begin(), m_handles.end(), handle_value) == m_handles.end())
            m_handles.push_back(handle_value);
    }
 
    std::vector<std::string> m_handles;
};
 
 
//------------------------------------------------------------------------------
//
// CUDA helper functions
//
//------------------------------------------------------------------------------
 
// Helper macro. Checks whether the expression is cudaSuccess and if not prints a message and
// resets the device and exits.
 
#ifdef ENABLE_DEPRECATED_UTILIY_FUNCTIONS
 
#define check_cuda_success(expr) \
    do { \
        int err = (expr); \
        if (err != 0) { \
            fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
                err, __FILE__, __LINE__, #expr); \
            keep_console_open(); \
            cudaDeviceReset(); \
            exit(EXIT_FAILURE); \
        } \
    } while (false)
 
#else
 
#define check_cuda_success(expr) \
    do { \
        int err = (expr); \
        if (err != 0) { \
            cudaDeviceReset(); \
            exit_failure( "Error in file %s, line %u: \"%s\".\n", __FILE__, __LINE__, #expr); \
        } \
    } while (false)
 
#endif
 
// Initialize CUDA.
CUcontext init_cuda(
    int ordinal
#ifdef OPENGL_INTEROP
    , const bool opengl_interop
#endif
    )
{
    CUdevice cu_device;
    CUcontext cu_context;
 
    check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
    if (opengl_interop) {
        // Use first device used by OpenGL context
        unsigned int num_cu_devices;
        check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
    }
    else
#endif
    {
        // Use given device
        check_cuda_success(cuDeviceGet(&cu_device, ordinal));
    }
 
    check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
 
    // For this example, increase printf CUDA buffer size to support a larger number
    // of MDL debug::print() calls per CUDA kernel launch
    cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
 
    return cu_context;
}
 
// Uninitialize CUDA.
void uninit_cuda(CUcontext cuda_context)
{
    check_cuda_success(cuCtxDestroy(cuda_context));
}
 
template<typename T> struct Resource_deleter {
    /*compile error*/
};
 
template<> struct Resource_deleter<cudaArray_t> {
    void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
 
template<> struct Resource_deleter<cudaMipmappedArray_t> {
    void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
 
template<> struct Resource_deleter<Texture> {
    void operator()(Texture &res) {
        check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
        check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
    }
};
 
template<> struct Resource_deleter<Mbsdf> {
    void operator()(Mbsdf &res) {
        for (size_t i = 0; i < 2; ++i) {
            if (res.has_data[i] != 0u) {
                check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
                check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
                check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
            }
        }
    }
};
 
template<> struct Resource_deleter<Lightprofile> {
    void operator()(Lightprofile res) {
        if (res.cdf_data)
            check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
    }
};
 
template<> struct Resource_deleter<Target_code_data> {
    void operator()(Target_code_data &res) {
        if (res.textures)
            check_cuda_success(cuMemFree(res.textures));
        if (res.ro_data_segment)
            check_cuda_success(cuMemFree(res.ro_data_segment));
    }
};
 
template<> struct Resource_deleter<CUdeviceptr> {
    void operator()(CUdeviceptr res) {
        if (res != 0)
            check_cuda_success(cuMemFree(res));
    }
};
 
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
    Resource_handle(T res) : m_res(res) {}
 
    ~Resource_handle() {
        D deleter;
        deleter(m_res);
    }
 
    T &get() { return m_res; }
 
    T const &get() const { return m_res; }
 
    void set(T res) { m_res = res; }
 
private:
    // No copy possible.
    Resource_handle(Resource_handle const &);
    Resource_handle &operator=(Resource_handle const &);
 
private:
    T m_res;
};
 
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
    Resource_container() : m_cont() {}
 
    ~Resource_container() {
        D deleter;
        typedef typename C::iterator I;
        for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
            T &r = *it;
            deleter(r);
        }
    }
 
    C &operator*() { return m_cont; }
 
    C const &operator*() const { return m_cont; }
 
    C *operator->() { return &m_cont; }
 
    C const *operator->() const { return &m_cont; }
 
private:
    // No copy possible.
    Resource_container(Resource_container const &);
    Resource_container &operator=(Resource_container const &);
 
private:
    C m_cont;
};
 
// Allocate memory on GPU and copy the given data to the allocated memory.
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
    CUdeviceptr device_ptr;
    check_cuda_success(cuMemAlloc(&device_ptr, size));
    check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
    return device_ptr;
}
 
// Allocate memory on GPU and copy the given data to the allocated memory.
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
    return gpu_mem_dup((void *)data->get(), size);
}
 
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T>
CUdeviceptr gpu_mem_dup(std::vector<T> const &data)
{
    return gpu_mem_dup(&data[0], data.size() * sizeof(T));
}
 
// Allocate memory on GPU and copy the given data to the allocated memory.
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
    return gpu_mem_dup(*cont);
}
 
 
//------------------------------------------------------------------------------
//
// Material_gpu_context class
//
//------------------------------------------------------------------------------
 
// Helper class responsible for making textures and read-only data available to the GPU
// by generating and managing a list of Target_code_data objects.
class Material_gpu_context
{
public:
    Material_gpu_context(bool enable_derivatives)
        : m_enable_derivatives(enable_derivatives)
        , m_device_target_code_data_list(0)
        , m_device_target_argument_block_list(0)
    {
        // Use first entry as "not-used" block
        m_target_argument_block_list->push_back(0);
    }
 
    // Prepare the needed data of the given target code.
    bool prepare_target_code_data(
        mi::neuraylib::ITransaction          *transaction,
        mi::neuraylib::IImage_api            *image_api,
        mi::neuraylib::ITarget_code const    *target_code,
        std::vector<size_t> const            &arg_block_indices);
 
    // Get a device pointer to the target code data list.
    CUdeviceptr get_device_target_code_data_list();
 
    // Get a device pointer to the target argument block list.
    CUdeviceptr get_device_target_argument_block_list();
 
    // Get a device pointer to the i'th target argument block.
    CUdeviceptr get_device_target_argument_block(size_t i)
    {
        // First entry is the "not-used" block, so start at index 1.
        if (i + 1 >= m_target_argument_block_list->size())
            return 0;
        return (*m_target_argument_block_list)[i + 1];
    }
 
    // Get the number of target argument blocks.
    size_t get_argument_block_count() const
    {
        return m_own_arg_blocks.size();
    }
 
    // Get the argument block of the i'th BSDF.
    // If the BSDF has no target argument block, size_t(~0) is returned.
    size_t get_bsdf_argument_block_index(size_t i) const
    {
        if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
        return m_bsdf_arg_block_indices[i];
    }
 
    // Get a writable copy of the i'th target argument block.
    mi::base::Handle<mi::neuraylib::ITarget_argument_block> get_argument_block(size_t i)
    {
        if (i >= m_own_arg_blocks.size())
            return mi::base::Handle<mi::neuraylib::ITarget_argument_block>();
        return m_own_arg_blocks[i];
    }
 
    // Get the layout of the i'th target argument block.
    mi::base::Handle<mi::neuraylib::ITarget_value_layout const> get_argument_block_layout(size_t i)
    {
        if (i >= m_arg_block_layouts.size())
            return mi::base::Handle<mi::neuraylib::ITarget_value_layout const>();
        return m_arg_block_layouts[i];
    }
 
    // Update the i'th target argument block on the device with the data from the corresponding
    // block returned by get_argument_block().
    void update_device_argument_block(size_t i);
private:
    // Copy the image data of a canvas to a CUDA array.
    void copy_canvas_to_cuda_array(cudaArray_t device_array, mi::neuraylib::ICanvas const *canvas);
 
    // Prepare the texture identified by the texture_index for use by the texture access functions
    // on the GPU.
    bool prepare_texture(
        mi::neuraylib::ITransaction       *transaction,
        mi::neuraylib::IImage_api         *image_api,
        mi::neuraylib::ITarget_code const *code_ptx,
        mi::Size                           texture_index,
        std::vector<Texture>              &textures);
 
    // Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
    // functions on the GPU.
    bool prepare_mbsdf(
        mi::neuraylib::ITransaction       *transaction,
        mi::neuraylib::ITarget_code const *code_ptx,
        mi::Size                           mbsdf_index,
        std::vector<Mbsdf>                &mbsdfs);
 
    // Prepare the mbsdf identified by the mbsdf_index for use by the bsdf measurement access
    // functions on the GPU.
    bool prepare_lightprofile(
        mi::neuraylib::ITransaction       *transaction,
        mi::neuraylib::ITarget_code const *code_ptx,
        mi::Size                           lightprofile_index,
        std::vector<Lightprofile>        &lightprofiles);
 
    // If true, mipmaps will be generated for all 2D textures.
    bool m_enable_derivatives;
 
    // The device pointer of the target code data list.
    Resource_handle<CUdeviceptr> m_device_target_code_data_list;
 
    // List of all target code data objects owned by this context.
    Resource_container<Target_code_data> m_target_code_data_list;
 
    // The device pointer of the target argument block list.
    Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
 
    // List of all target argument blocks owned by this context.
    Resource_container<CUdeviceptr> m_target_argument_block_list;
 
    // List of all local, writable copies of the target argument blocks.
    std::vector<mi::base::Handle<mi::neuraylib::ITarget_argument_block> > m_own_arg_blocks;
 
    // List of argument block indices per material BSDF.
    std::vector<size_t> m_bsdf_arg_block_indices;
 
    // List of all target argument block layouts.
    std::vector<mi::base::Handle<mi::neuraylib::ITarget_value_layout const> > m_arg_block_layouts;
 
    // List of all Texture objects owned by this context.
    Resource_container<Texture> m_all_textures;
 
    // List of all MBSDFs objects owned by this context.
    Resource_container<Mbsdf> m_all_mbsdfs;
 
    // List of all Light profiles objects owned by this context.
    Resource_container<Lightprofile> m_all_lightprofiles;
 
    // List of all CUDA arrays owned by this context.
    Resource_container<cudaArray_t> m_all_texture_arrays;
 
    // List of all CUDA mipmapped arrays owned by this context.
    Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
 
// Get a device pointer to the target code data list.
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
    if (!m_device_target_code_data_list.get())
        m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
    return m_device_target_code_data_list.get();
}
 
// Get a device pointer to the target argument block list.
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
    if (!m_device_target_argument_block_list.get())
        m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
    return m_device_target_argument_block_list.get();
}
 
// Copy the image data of a canvas to a CUDA array.
void Material_gpu_context::copy_canvas_to_cuda_array(
    cudaArray_t device_array,
    mi::neuraylib::ICanvas const *canvas)
{
    mi::base::Handle<const mi::neuraylib::ITile> tile(canvas->get_tile());
    mi::Float32 const *data = static_cast<mi::Float32 const *>(tile->get_data());
    check_cuda_success(cudaMemcpy2DToArray(
        device_array, 0, 0, data,
        canvas->get_resolution_x() * sizeof(float) * 4,
        canvas->get_resolution_x() * sizeof(float) * 4,
        canvas->get_resolution_y(),
        cudaMemcpyHostToDevice));
}
 
// Prepare the texture identified by the texture_index for use by the texture access functions
// on the GPU.
bool Material_gpu_context::prepare_texture(
    mi::neuraylib::ITransaction       *transaction,
    mi::neuraylib::IImage_api         *image_api,
    mi::neuraylib::ITarget_code const *code_ptx,
    mi::Size                           texture_index,
    std::vector<Texture>              &textures)
{
    // Get access to the texture data by the texture database name from the target code.
    mi::base::Handle<const mi::neuraylib::ITexture> texture(
        transaction->access<mi::neuraylib::ITexture>(code_ptx->get_texture(texture_index)));
    mi::base::Handle<const mi::neuraylib::IImage> image(
        transaction->access<mi::neuraylib::IImage>(texture->get_image()));
    mi::base::Handle<const mi::neuraylib::ICanvas> canvas(image->get_canvas(0, 0, 0));
    mi::Uint32 tex_width = canvas->get_resolution_x();
    mi::Uint32 tex_height = canvas->get_resolution_y();
    mi::Uint32 tex_layers = canvas->get_layers_size();
    char const *image_type = image->get_type(0, 0);
 
    if (image->is_uvtile() || image->is_animated()) {
        std::cerr << "The example does not support uvtile and/or animated textures!" << std::endl;
        return false;
    }
 
    // For simplicity, the texture access functions are only implemented for float4 and gamma
    // is pre-applied here (all images are converted to linear space).
 
    // Convert to linear color space if necessary
    if (texture->get_effective_gamma(0, 0) != 1.0f) {
        // Copy/convert to float4 canvas and adjust gamma from "effective gamma" to 1.
        mi::base::Handle<mi::neuraylib::ICanvas> gamma_canvas(
            image_api->convert(canvas.get(), "Color"));
        gamma_canvas->set_gamma(texture->get_effective_gamma(0, 0));
        image_api->adjust_gamma(gamma_canvas.get(), 1.0f);
        canvas = gamma_canvas;
    } else if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0) {
        // Convert to expected format
        canvas = image_api->convert(canvas.get(), "Color");
    }
 
    cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
    cudaResourceDesc res_desc;
    memset(&res_desc, 0, sizeof(res_desc));
 
    // Copy image data to GPU array depending on texture shape
    mi::neuraylib::ITarget_code::Texture_shape texture_shape =
        code_ptx->get_texture_shape(texture_index);
    if (texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube ||
        texture_shape == mi::neuraylib::ITarget_code::Texture_shape_3d ||
        texture_shape == mi::neuraylib::ITarget_code::Texture_shape_bsdf_data) {
        // Cubemap and 3D texture objects require 3D CUDA arrays
 
        if (texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube &&
            tex_layers != 6) {
            std::cerr << "Invalid number of layers (" << tex_layers
                << "), cubemaps must have 6 layers!" << std::endl;
            return false;
        }
 
        // Allocate a 3D array on the GPU
        cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
        cudaArray_t device_tex_array;
        check_cuda_success(cudaMalloc3DArray(
            &device_tex_array, &channel_desc, extent,
            texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube ?
            cudaArrayCubemap : 0));
 
        // Prepare the memcpy parameter structure
        cudaMemcpy3DParms copy_params;
        memset(&copy_params, 0, sizeof(copy_params));
        copy_params.dstArray = device_tex_array;
        copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
        copy_params.kind = cudaMemcpyHostToDevice;
 
        // Copy the image data of all layers (the layers are not consecutive in memory)
        for (mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
            mi::base::Handle<const mi::neuraylib::ITile> tile(
                canvas->get_tile(layer));
            float const *data = static_cast<float const *>(tile->get_data());
 
            copy_params.srcPtr = make_cudaPitchedPtr(
                const_cast<float *>(data), tex_width * sizeof(float) * 4,
                tex_width, tex_height);
            copy_params.dstPos = make_cudaPos(0, 0, layer);
 
            check_cuda_success(cudaMemcpy3D(&copy_params));
        }
 
        res_desc.resType = cudaResourceTypeArray;
        res_desc.res.array.array = device_tex_array;
 
        m_all_texture_arrays->push_back(device_tex_array);
    } else if (m_enable_derivatives) {
        // mipmapped textures use CUDA mipmapped arrays
        mi::Uint32 num_levels = image->get_levels(0, 0);
        cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
        cudaMipmappedArray_t device_tex_miparray;
        check_cuda_success(cudaMallocMipmappedArray(
            &device_tex_miparray, &channel_desc, extent, num_levels));
 
        // create all mipmap levels and copy them to the CUDA arrays in the mipmapped array
        mi::base::Handle<mi::IArray> mipmaps(image_api->create_mipmap(canvas.get(), 1.0f));
 
        for (mi::Uint32 level = 0; level < num_levels; ++level) {
            mi::base::Handle<mi::neuraylib::ICanvas const> level_canvas;
            if (level == 0)
                level_canvas = canvas;
            else {
                mi::base::Handle<mi::IPointer> mipmap_ptr(mipmaps->get_element<mi::IPointer>(level - 1));
                level_canvas = mipmap_ptr->get_pointer<mi::neuraylib::ICanvas>();
            }
            cudaArray_t device_level_array;
            cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
            copy_canvas_to_cuda_array(device_level_array, level_canvas.get());
        }
 
        res_desc.resType = cudaResourceTypeMipmappedArray;
        res_desc.res.mipmap.mipmap = device_tex_miparray;
 
        m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
    } else {
        // 2D texture objects use CUDA arrays
        cudaArray_t device_tex_array;
        check_cuda_success(cudaMallocArray(
            &device_tex_array, &channel_desc, tex_width, tex_height));
 
        copy_canvas_to_cuda_array(device_tex_array, canvas.get());
 
        res_desc.resType = cudaResourceTypeArray;
        res_desc.res.array.array = device_tex_array;
 
        m_all_texture_arrays->push_back(device_tex_array);
    }
 
    // For cube maps we need clamped address mode to avoid artifacts in the corners
    cudaTextureAddressMode addr_mode =
        texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube
        ? cudaAddressModeClamp
        : cudaAddressModeWrap;
 
    // Create filtered texture object
    cudaTextureDesc tex_desc;
    memset(&tex_desc, 0, sizeof(tex_desc));
    tex_desc.addressMode[0]   = addr_mode;
    tex_desc.addressMode[1]   = addr_mode;
    tex_desc.addressMode[2]   = addr_mode;
    tex_desc.filterMode       = cudaFilterModeLinear;
    tex_desc.readMode         = cudaReadModeElementType;
    tex_desc.normalizedCoords = 1;
    if (res_desc.resType == cudaResourceTypeMipmappedArray) {
        tex_desc.mipmapFilterMode = cudaFilterModeLinear;
        tex_desc.maxAnisotropy = 16;
        tex_desc.minMipmapLevelClamp = 0.f;
        tex_desc.maxMipmapLevelClamp = 1000.f;  // default value in OpenGL
    }
 
    cudaTextureObject_t tex_obj = 0;
    check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
 
    // Create unfiltered texture object if necessary (cube textures have no texel functions)
    cudaTextureObject_t tex_obj_unfilt = 0;
    if (texture_shape != mi::neuraylib::ITarget_code::Texture_shape_cube) {
        // Use a black border for access outside of the texture
        tex_desc.addressMode[0]   = cudaAddressModeBorder;
        tex_desc.addressMode[1]   = cudaAddressModeBorder;
        tex_desc.addressMode[2]   = cudaAddressModeBorder;
        tex_desc.filterMode       = cudaFilterModePoint;
 
        check_cuda_success(cudaCreateTextureObject(
            &tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
    }
 
    // Store texture infos in result vector
    textures.push_back(Texture(
        tex_obj,
        tex_obj_unfilt,
        make_uint3(tex_width, tex_height, tex_layers)));
    m_all_textures->push_back(textures.back());
 
    return true;
}
 
namespace
{
    bool prepare_mbsdfs_part(mi::neuraylib::Mbsdf_part part, Mbsdf& mbsdf_cuda_representation,
                             const mi::neuraylib::IBsdf_measurement* bsdf_measurement)
    {
        mi::base::Handle<const mi::neuraylib::Bsdf_isotropic_data> dataset;
        switch (part)
        {
            case mi::neuraylib::MBSDF_DATA_REFLECTION:
                dataset = bsdf_measurement->get_reflection<mi::neuraylib::Bsdf_isotropic_data>();
                break;
            case mi::neuraylib::MBSDF_DATA_TRANSMISSION:
                dataset = bsdf_measurement->get_transmission<mi::neuraylib::Bsdf_isotropic_data>();
                break;
        }
 
        // no data, fine
        if (!dataset)
            return true;
 
        // get dimensions
        uint2 res;
        res.x = dataset->get_resolution_theta();
        res.y = dataset->get_resolution_phi();
        unsigned num_channels = dataset->get_type() == mi::neuraylib::BSDF_SCALAR ? 1 : 3;
        mbsdf_cuda_representation.Add(part, res, num_channels);
 
 
        // get data
        mi::base::Handle<const mi::neuraylib::IBsdf_buffer> buffer(dataset->get_bsdf_buffer());
        // {1,3} * (index_theta_in * (res_phi * res_theta) + index_theta_out * res_phi + index_phi)
 
        const mi::Float32* src_data = buffer->get_data();
 
        // ----------------------------------------------------------------------------------------
        // prepare importance sampling data:
        // - for theta_in we will be able to perform a two stage CDF, first to select theta_out,
        //   and second to select phi_out
        // - maximum component is used to "probability" in case of colored measurements
 
        // CDF of the probability to select a certain theta_out for a given theta_in
        const unsigned int cdf_theta_size = res.x * res.x;
 
        // for each of theta_in x theta_out combination, a CDF of the probabilities to select a
        // a certain theta_out is stored
        const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
        float* sample_data = new float[sample_data_size];
 
        float* albedo_data = new float[res.x]; // albedo for sampling reflection and transmission
 
        float* sample_data_theta = sample_data;                // begin of the first (theta) CDF
        float* sample_data_phi = sample_data + cdf_theta_size; // begin of the second (phi) CDFs
 
        const float s_theta = (float) (M_PI * 0.5) / float(res.x);  // step size
        const float s_phi = (float) (M_PI) / float(res.y);          // step size
 
        float max_albedo = 0.0f;
        for (unsigned int t_in = 0; t_in < res.x; ++t_in)
        {
            float sum_theta = 0.0f;
            float sintheta0_sqd = 0.0f;
            for (unsigned int t_out = 0; t_out < res.x; ++t_out)
            {
                const float sintheta1 = sinf(float(t_out + 1) * s_theta);
                const float sintheta1_sqd = sintheta1 * sintheta1;
 
                // BSDFs are symmetric: f(w_in, w_out) = f(w_out, w_in)
                // take the average of both measurements
 
                // area of two the surface elements (the ones we are averaging)
                const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
                sintheta0_sqd = sintheta1_sqd;
 
                // offset for both the thetas into the measurement data (select row in the volume)
                const unsigned int offset_phi  = (t_in * res.x + t_out) * res.y;
                const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
 
                // build CDF for phi
                float sum_phi = 0.0f;
                for (unsigned int p_out = 0; p_out < res.y; ++p_out)
                {
                    const unsigned int idx  = offset_phi  + p_out;
                    const unsigned int idx2 = offset_phi2 + p_out;
 
                    float value = 0.0f;
                    if (num_channels == 3)
                    {
                        value = fmax(fmaxf(src_data[3 * idx  + 0], src_data[3 * idx  + 1]),
                                     fmaxf(src_data[3 * idx  + 2], 0.0f))
                              + fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
                                     fmaxf(src_data[3 * idx2 + 2], 0.0f));
                    }
                    else /* num_channels == 1 */
                    {
                        value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
                    }
 
                    sum_phi += value * mu;
                    sample_data_phi[idx] = sum_phi;
                }
 
                // normalize CDF for phi
                for (unsigned int p_out = 0; p_out < res.y; ++p_out)
                {
                    const unsigned int idx = offset_phi + p_out;
                    sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
                }
 
                // build CDF for theta
                sum_theta += sum_phi;
                sample_data_theta[t_in * res.x + t_out] = sum_theta;
            }
 
            if (sum_theta > max_albedo)
                max_albedo = sum_theta;
 
            albedo_data[t_in] = sum_theta;
 
            // normalize CDF for theta
            for (unsigned int t_out = 0; t_out < res.x; ++t_out)
            {
                const unsigned int idx = t_in * res.x + t_out;
                sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
            }
        }
 
        // copy entire CDF data buffer to GPU
        CUdeviceptr sample_obj = 0;
        check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
        check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
        delete[] sample_data;
 
        CUdeviceptr albedo_obj = 0;
        check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
        check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
        delete[] albedo_data;
 
 
        mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
        mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
        mbsdf_cuda_representation.max_albedo[part] = max_albedo;
 
        // ----------------------------------------------------------------------------------------
        // prepare evaluation data:
        // - simply store the measured data in a volume texture
        // - in case of color data, we store each sample in a vector4 to get texture support
        unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
 
        // make lookup data symmetric
        float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
        for (unsigned int t_in = 0; t_in < res.x; ++t_in)
        {
            for (unsigned int t_out = 0; t_out < res.x; ++t_out)
            {
                const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
                const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
                for (unsigned int p_out = 0; p_out < res.y; ++p_out)
                {
                    const unsigned int idx = offset_phi + p_out;
                    const unsigned int idx2 = offset_phi2 + p_out;
 
                    if (num_channels == 3)
                    {
                        lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
                        lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
                        lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
                        lookup_data[4*idx+3] = 1.0f;
                    }
                    else
                    {
                        lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
                    }
                }
            }
        }
 
        // Copy data to GPU array
        cudaArray_t device_mbsdf_data;
        cudaChannelFormatDesc channel_desc = (num_channels == 3
            ? cudaCreateChannelDesc<float4>() // float3 is not supported
            : cudaCreateChannelDesc<float>());
 
        // Allocate a 3D array on the GPU (phi_delta x theta_out x theta_in)
        cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
        check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
 
        // prepare and copy
        cudaMemcpy3DParms copy_params;
        memset(&copy_params, 0, sizeof(copy_params));
        copy_params.srcPtr = make_cudaPitchedPtr(
            (void*)(lookup_data),                                   // base pointer
            res.y * lookup_channels * sizeof(float),                // row pitch
            res.y,                                                  // width of slice
            res.x);                                                 // height of slice
        copy_params.dstArray = device_mbsdf_data;
        copy_params.extent = extent;
        copy_params.kind = cudaMemcpyHostToDevice;
        check_cuda_success(cudaMemcpy3D(&copy_params));
        delete[] lookup_data;
 
        cudaResourceDesc    texRes;
        memset(&texRes, 0, sizeof(cudaResourceDesc));
        texRes.resType = cudaResourceTypeArray;
        texRes.res.array.array = device_mbsdf_data;
 
        cudaTextureDesc     texDescr;
        memset(&texDescr, 0, sizeof(cudaTextureDesc));
        texDescr.normalizedCoords = 1;
        texDescr.filterMode = cudaFilterModeLinear;
        texDescr.addressMode[0] = cudaAddressModeClamp;
        texDescr.addressMode[1] = cudaAddressModeClamp;
        texDescr.addressMode[2] = cudaAddressModeClamp;
        texDescr.readMode = cudaReadModeElementType;
 
        cudaTextureObject_t eval_tex_obj;
        check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
        mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
 
        return true;
    }
}
 
bool Material_gpu_context::prepare_mbsdf(
    mi::neuraylib::ITransaction       *transaction,
    mi::neuraylib::ITarget_code const *code_ptx,
    mi::Size                           mbsdf_index,
    std::vector<Mbsdf>                &mbsdfs)
{
    // Get access to the texture data by the texture database name from the target code.
    mi::base::Handle<const mi::neuraylib::IBsdf_measurement> mbsdf(
        transaction->access<mi::neuraylib::IBsdf_measurement>(
        code_ptx->get_bsdf_measurement(mbsdf_index)));
 
    Mbsdf mbsdf_cuda;
 
    // handle reflection and transmission
    if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_REFLECTION, mbsdf_cuda, mbsdf.get()))
        return false;
    if (!prepare_mbsdfs_part(mi::neuraylib::MBSDF_DATA_TRANSMISSION, mbsdf_cuda, mbsdf.get()))
        return false;
 
    mbsdfs.push_back(mbsdf_cuda);
    m_all_mbsdfs->push_back(mbsdfs.back());
    return true;
}
 
bool Material_gpu_context::prepare_lightprofile(
    mi::neuraylib::ITransaction       *transaction,
    mi::neuraylib::ITarget_code const *code_ptx,
    mi::Size                           lightprofile_index,
    std::vector<Lightprofile>         &lightprofiles)
{
 
    // Get access to the texture data by the texture database name from the target code.
    mi::base::Handle<const mi::neuraylib::ILightprofile> lprof_nr(
        transaction->access<mi::neuraylib::ILightprofile>(
        code_ptx->get_light_profile(lightprofile_index)));
 
    uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
    float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
    float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
 
    // phi-mayor: [res.x x res.y]
    const float* data = lprof_nr->get_data();
 
    // --------------------------------------------------------------------------------------------
    // compute total power
    // compute inverse CDF data for sampling
    // sampling will work on cells rather than grid nodes (used for evaluation)
 
    // first (res.x-1) for the cdf for sampling theta
    // rest (rex.x-1) * (res.y-1) for the individual cdfs for sampling phi (after theta)
    size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
    float* cdf_data = new float[cdf_data_size];
 
    float debug_total_erea = 0.0f;
    float sum_theta = 0.0f;
    float total_power = 0.0f;
    float cos_theta0 = cosf(start.x);
    for (unsigned int t = 0; t < res.x - 1; ++t)
    {
        const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
 
        // area of the patch (grid cell)
        // \mu = int_{theta0}^{theta1} sin{theta} \delta theta
        const float mu = cos_theta0 - cos_theta1;
        cos_theta0 = cos_theta1;
 
        // build CDF for phi
        float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
        float sum_phi = 0.0f;
        for (unsigned int p = 0; p < res.y - 1; ++p)
        {
            // the probability to select a patch corresponds to the value times area
            // the value of a cell is the average of the corners
            // omit the *1/4 as we normalize in the end
            float value = data[p * res.x + t]
                + data[p * res.x + t + 1]
                + data[(p + 1) * res.x + t]
                + data[(p + 1) * res.x + t + 1];
 
            sum_phi += value * mu;
            cdf_data_phi[p] = sum_phi;
            debug_total_erea += mu;
        }
 
        // normalize CDF for phi
        for (unsigned int p = 0; p < res.y - 2; ++p)
            cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
 
        cdf_data_phi[res.y - 2] = 1.0f;
 
        // build CDF for theta
        sum_theta += sum_phi;
        cdf_data[t] = sum_theta;
    }
    total_power = sum_theta * 0.25f * delta.y;
 
    // normalize CDF for theta
    for (unsigned int t = 0; t < res.x - 2; ++t)
        cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
 
    cdf_data[res.x - 2] = 1.0f;
 
    // copy entire CDF data buffer to GPU
    CUdeviceptr cdf_data_obj = 0;
    check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
    check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
    delete[] cdf_data;
 
    // --------------------------------------------------------------------------------------------
    // prepare evaluation data
    //  - use a 2d texture that allows bilinear interpolation
    // Copy data to GPU array
    cudaArray_t device_lightprofile_data;
    cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
 
    // 2D texture objects use CUDA arrays
    check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
    check_cuda_success(cudaMemcpy2DToArray(
        device_lightprofile_data, 0, 0, data,
        res.x * sizeof(float), res.x * sizeof(float), res.y, cudaMemcpyHostToDevice));
 
    // Create filtered texture object
    cudaResourceDesc res_desc;
    memset(&res_desc, 0, sizeof(res_desc));
    res_desc.resType = cudaResourceTypeArray;
    res_desc.res.array.array = device_lightprofile_data;
 
    cudaTextureDesc tex_desc;
    memset(&tex_desc, 0, sizeof(tex_desc));
    tex_desc.addressMode[0] = cudaAddressModeClamp;
    tex_desc.addressMode[1] = cudaAddressModeClamp;
    tex_desc.addressMode[2] = cudaAddressModeClamp;
    tex_desc.borderColor[0] = 1.0f;
    tex_desc.borderColor[1] = 1.0f;
    tex_desc.borderColor[2] = 1.0f;
    tex_desc.borderColor[3] = 1.0f;
    tex_desc.filterMode = cudaFilterModeLinear;
    tex_desc.readMode = cudaReadModeElementType;
    tex_desc.normalizedCoords = 1;
 
    cudaTextureObject_t tex_obj = 0;
    check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
 
    double multiplier = lprof_nr->get_candela_multiplier();
    Lightprofile lprof(
        res,
        start,
        delta,
        float(multiplier),
        float(total_power * multiplier),
        tex_obj,
        reinterpret_cast<float*>(cdf_data_obj));
 
    lightprofiles.push_back(lprof);
    m_all_lightprofiles->push_back(lightprofiles.back());
    return true;
}
 
// Prepare the needed target code data of the given target code.
bool Material_gpu_context::prepare_target_code_data(
    mi::neuraylib::ITransaction          *transaction,
    mi::neuraylib::IImage_api            *image_api,
    mi::neuraylib::ITarget_code const    *target_code,
    std::vector<size_t> const            &arg_block_indices)
{
    // Target code data list may not have been retrieved already
    check_success(m_device_target_code_data_list.get() == 0);
 
    // Handle the read-only data segments if necessary.
    // They are only created, if the "enable_ro_segment" backend option was set to "on".
    CUdeviceptr device_ro_data = 0;
    if (target_code->get_ro_data_segment_count() > 0) {
        device_ro_data = gpu_mem_dup(
            target_code->get_ro_data_segment_data(0),
            target_code->get_ro_data_segment_size(0));
    }
 
    // Copy textures to GPU if the code has more than just the invalid texture
    CUdeviceptr device_textures = 0;
    mi::Size num_textures = target_code->get_texture_count();
    if (num_textures > 1) {
        std::vector<Texture> textures;
 
        // Loop over all textures skipping the first texture,
        // which is always the invalid texture
        for (mi::Size i = 1; i < num_textures; ++i) {
            if (!prepare_texture(
                    transaction, image_api, target_code, i, textures))
                return false;
        }
 
        // Copy texture list to GPU
        device_textures = gpu_mem_dup(textures);
    }
 
    // Copy MBSDFs to GPU if the code has more than just the invalid mbsdf
    CUdeviceptr device_mbsdfs = 0;
    mi::Size num_mbsdfs = target_code->get_bsdf_measurement_count();
    if (num_mbsdfs > 1) {
        std::vector<Mbsdf> mbsdfs;
 
        // Loop over all mbsdfs skipping the first mbsdf,
        // which is always the invalid mbsdf
        for (mi::Size i = 1; i < num_mbsdfs; ++i) {
            if (!prepare_mbsdf(
                transaction, target_code, i, mbsdfs))
                return false;
        }
 
        // Copy mbsdf list to GPU
        device_mbsdfs = gpu_mem_dup(mbsdfs);
    }
 
    // Copy light profiles to GPU if the code has more than just the invalid light profile
    CUdeviceptr device_lightprofiles = 0;
    mi::Size num_lightprofiles = target_code->get_light_profile_count();
    if (num_lightprofiles > 1) {
        std::vector<Lightprofile> lightprofiles;
 
        // Loop over all profiles skipping the first profile,
        // which is always the invalid profile
        for (mi::Size i = 1; i < num_lightprofiles; ++i) {
            if (!prepare_lightprofile(
                transaction, target_code, i, lightprofiles))
                return false;
        }
 
        // Copy light profile list to GPU
        device_lightprofiles = gpu_mem_dup(lightprofiles);
    }
 
    (*m_target_code_data_list).push_back(
        Target_code_data(num_textures, device_textures,
                         num_mbsdfs, device_mbsdfs,
                         num_lightprofiles, device_lightprofiles,
                         device_ro_data));
 
    for (mi::Size i = 0, num = target_code->get_argument_block_count(); i < num; ++i) {
        mi::base::Handle<mi::neuraylib::ITarget_argument_block const> arg_block(
            target_code->get_argument_block(i));
        CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
        m_target_argument_block_list->push_back(dev_block);
        m_own_arg_blocks.push_back(mi::base::make_handle(arg_block->clone()));
        m_arg_block_layouts.push_back(
            mi::base::make_handle(target_code->get_argument_block_layout(i)));
    }
 
    for (size_t arg_block_index : arg_block_indices) {
        m_bsdf_arg_block_indices.push_back(arg_block_index);
    }
 
    return true;
}
 
// Update the i'th target argument block on the device with the data from the corresponding
// block returned by get_argument_block().
void Material_gpu_context::update_device_argument_block(size_t i)
{
    CUdeviceptr device_ptr = get_device_target_argument_block(i);
    if (device_ptr == 0) return;
 
    mi::base::Handle<mi::neuraylib::ITarget_argument_block> arg_block(get_argument_block(i));
    check_cuda_success(cuMemcpyHtoD(
        device_ptr, arg_block->get_data(), arg_block->get_size()));
}
 
 
//------------------------------------------------------------------------------
//
// MDL material compilation code
//
//------------------------------------------------------------------------------
 
class Material_compiler {
public:
    // Constructor.
    Material_compiler(
        mi::neuraylib::IMdl_impexp_api* mdl_impexp_api,
        mi::neuraylib::IMdl_backend_api* mdl_backend_api,
        mi::neuraylib::IMdl_factory* mdl_factory,
        mi::neuraylib::ITransaction* transaction,
        unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
        bool use_df_interpreter,
#endif
        bool enable_derivatives,
        bool fold_ternary_on_df,
        bool enable_auxiliary,
        bool enable_pdf,
        bool use_adapt_normal,
        bool enable_bsdf_flags,
        bool enable_mdl_next,
        const std::string& df_handle_mode,
        const std::string& lambda_return_mode);
 
    // Loads an MDL module and returns the module DB.
    std::string load_module(const std::string& mdl_module_name);
 
    // Add a subexpression of a given material to the link unit.
    // path is the path of the sub-expression.
    // fname is the function name in the generated code.
    // If class_compilation is true, the material will use class compilation.
    bool add_material_subexpr(
        const std::string& qualified_module_name,
        const std::string& material_db_name,
        const char* path,
        const char* fname,
        bool class_compilation=false);
 
    // Add a distribution function of a given material to the link unit.
    // path is the path of the sub-expression.
    // fname is the function name in the generated code.
    // If class_compilation is true, the material will use class compilation.
    bool add_material_df(
        const std::string& qualified_module_name,
        const std::string& material_db_name,
        const char* path,
        const char* base_fname,
        bool class_compilation=false);
 
    // Add (multiple) MDL distribution function and expressions of a material to this link unit.
    // For each distribution function it results in four functions, suffixed with \c "_init",
    // \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
    // a list of \c Target_function_descriptions. Each of them needs to define the \c path, the root
    // of the expression that should be translated. After calling this function, each element of
    // the list will contain information for later usage in the application,
    // e.g., the \c argument_block_index and the \c function_index.
    bool add_material(
        const std::string& qualified_module_name,
        const std::string& material_db_name,
        mi::neuraylib::Target_function_description* function_descriptions,
        mi::Size description_count,
        bool class_compilation);
 
    // Generates CUDA PTX target code for the current link unit.
    mi::base::Handle<const mi::neuraylib::ITarget_code> generate_cuda_ptx();
 
    typedef std::vector<mi::base::Handle<mi::neuraylib::IFunction_definition const> >
        Material_definition_list;
 
    // Get the list of used material definitions.
    // There will be one entry per add_* call.
    Material_definition_list const &get_material_defs()
    {
        return m_material_defs;
    }
 
    typedef std::vector<mi::base::Handle<mi::neuraylib::ICompiled_material const> >
        Compiled_material_list;
 
    // Get the list of compiled materials.
    // There will be one entry per add_* call.
    Compiled_material_list const &get_compiled_materials()
    {
        return m_compiled_materials;
    }
 
    std::vector<size_t> const &get_argument_block_indices() const {
        return m_arg_block_indexes;
    }
 
    const std::vector<std::string>& get_handles() const {
        return m_handles;
    }
 
private:
    // Creates an instance of the given material.
    mi::neuraylib::IFunction_call* create_material_instance(
        const std::string& qualified_module_name,
        const std::string& material_db_name);
 
    // Compiles the given material instance in the given compilation modes.
    mi::neuraylib::ICompiled_material* compile_material_instance(
        mi::neuraylib::IFunction_call* material_instance,
        bool class_compilation);
 
private:
    mi::base::Handle<mi::neuraylib::IMdl_impexp_api> m_mdl_impexp_api;
    mi::base::Handle<mi::neuraylib::IMdl_backend>    m_be_cuda_ptx;
    mi::base::Handle<mi::neuraylib::IMdl_factory>    m_mdl_factory;
    mi::base::Handle<mi::neuraylib::ITransaction>    m_transaction;
 
    mi::base::Handle<mi::neuraylib::IMdl_execution_context> m_context;
    mi::base::Handle<mi::neuraylib::ILink_unit>             m_link_unit;
 
    Material_definition_list  m_material_defs;
    Compiled_material_list    m_compiled_materials;
    std::vector<size_t>       m_arg_block_indexes;
    std::vector<std::string>  m_handles;
};
 
// Constructor.
Material_compiler::Material_compiler(
        mi::neuraylib::IMdl_impexp_api* mdl_impexp_api,
        mi::neuraylib::IMdl_backend_api* mdl_backend_api,
        mi::neuraylib::IMdl_factory* mdl_factory,
        mi::neuraylib::ITransaction* transaction,
        unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
        bool use_df_interpreter,
#endif
        bool enable_derivatives,
        bool fold_ternary_on_df,
        bool enable_auxiliary,
        bool enable_pdf,
        bool use_adapt_normal,
        bool enable_bsdf_flags,
        bool enable_mdl_next,
        const std::string& df_handle_mode,
        const std::string& lambda_return_mode)
    : m_mdl_impexp_api(mdl_impexp_api, mi::base::DUP_INTERFACE)
    , m_be_cuda_ptx(mdl_backend_api->get_backend(mi::neuraylib::IMdl_backend_api::MB_CUDA_PTX))
    , m_mdl_factory(mdl_factory, mi::base::DUP_INTERFACE)
    , m_transaction(transaction, mi::base::DUP_INTERFACE)
    , m_context(mdl_factory->create_execution_context())
    , m_link_unit()
{
    check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
 
    // Option "enable_ro_segment": Default is disabled.
    // If you have a lot of big arrays, enabling this might speed up compilation.
    // check_success(m_be_cuda_ptx->set_option("enable_ro_segment", "on") == 0);
 
    if (enable_derivatives) {
        // Option "texture_runtime_with_derivs": Default is disabled.
        // We enable it to get coordinates with derivatives for texture lookup functions.
        check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
    }
 
    // Option "tex_lookup_call_mode": Default mode is vtable mode.
    // You can switch to the slower vtable mode by commenting out the next line.
    check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
 
    // Option "num_texture_results": Default is 0.
    // Set the size of a renderer provided array for texture results in the MDL SDK state in number
    // of float4 elements processed by the init() function.
    check_success(m_be_cuda_ptx->set_option(
        "num_texture_results",
        to_string(num_texture_results).c_str()) == 0);
 
    if (enable_auxiliary) {
        // Option "enable_auxiliary": Default is disabled.
        // We enable it to create an additional 'auxiliary' function that can be called on each
        // distribution function to fill an albedo and normal buffer e.g. for denoising.
        check_success(m_be_cuda_ptx->set_option("enable_auxiliary", "on") == 0);
    }
    if (!enable_pdf) {
        // Option "enable_pdf": Default is enabled.
        check_success(m_be_cuda_ptx->set_option("enable_pdf", "off") == 0);
    }
 
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
    // Option "enable_df_interpreter": Default is disabled.
    // Using the interpreter allows to reuse the same code for multiple materials
    // reducing code divergence, if your scene shows many materials at the same time.
    if (use_df_interpreter) {
        check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
    }
#endif
 
    // Option "df_handle_slot_mode": Default is "none".
    // When using light path expressions, individual parts of the distribution functions can be
    // selected using "handles". The contribution of each of those parts has to be evaluated during
    // rendering. This option controls how many parts are evaluated with each call into the
    // generated "evaluate" and "auxiliary" functions and how the data is passed.
    // The CUDA backend supports pointers, which means an externally managed buffer of arbitrary
    // size is used to transport the contributions of each part.
    check_success(m_be_cuda_ptx->set_option("df_handle_slot_mode", df_handle_mode.c_str()) == 0);
 
    // Option "lambda_return_mode": Default is "default".
    // Selects how generated lambda functions return their results. For PTX, the default
    // is equivalent to "sret" mode, where all values are returned in a buffer provided as
    // first argument. In "value" mode, base types and vector types are directly returned
    // by the functions, other types are returned as for the "sret" mode.
    check_success(m_be_cuda_ptx->set_option("lambda_return_mode", lambda_return_mode.c_str()) == 0);
 
    // Option "scene_data_names": Default is "".
    // Uncomment the line below to enable calling the scene data runtime functions
    // for any scene data names or specify a comma-separated list of names for which
    // you may provide scene data. The example runtime functions always return the
    // default values, which is the same as not supporting any scene data.
    //     m_be_cuda_ptx->set_option("scene_data_names", "*");
 
    if (use_adapt_normal) {
        // Option "use_renderer_adapt_normal": Default is "off".
        // If enabled, the renderer can adapt the normal of BSDFs before use.
        check_success(m_be_cuda_ptx->set_option("use_renderer_adapt_normal", "on") == 0);
    }
 
    if (enable_bsdf_flags) {
        // Option "libbsdf_flags_in_bsdf_data": Default is "off".
        // If enabled, the generated code will use the optional "flags" field
        // in the BSDF data structures.
        check_success(m_be_cuda_ptx->set_option("libbsdf_flags_in_bsdf_data", "on") == 0);
    }
 
    if (enable_mdl_next) {
        m_context->set_option("mdl_next", true);
    }
 
    // force experimental to true for now
    m_context->set_option("experimental", true);
 
    m_context->set_option("fold_ternary_on_df", fold_ternary_on_df);
 
    // After we set the options, we can create the link unit
    m_link_unit = mi::base::make_handle(m_be_cuda_ptx->create_link_unit(transaction, m_context.get()));
}
 
std::string Material_compiler::load_module(const std::string& mdl_module_name)
{
    // load module
    m_mdl_impexp_api->load_module(m_transaction.get(), mdl_module_name.c_str(), m_context.get());
    if (!print_messages(m_context.get()))
        exit_failure("Failed to load module: %s", mdl_module_name.c_str());
 
    // get and return the DB name
    mi::base::Handle<const mi::IString> db_module_name(
        m_mdl_factory->get_db_module_name(mdl_module_name.c_str()));
    return db_module_name->get_c_str();
}
 
// Creates an instance of the given material.
mi::neuraylib::IFunction_call* Material_compiler::create_material_instance(
    const std::string& qualified_module_name,
    const std::string& material_db_name)
{
    // Create a material instance from the material definition
    // with the default arguments.
    mi::base::Handle<const mi::neuraylib::IFunction_definition> material_definition(
        m_transaction->access<mi::neuraylib::IFunction_definition>(
            material_db_name.c_str()));
    if (!material_definition) {
        // material with given name does not exist
        print_message(
            mi::base::details::MESSAGE_SEVERITY_ERROR,
            mi::neuraylib::IMessage::MSG_COMPILER_DAG,
            (
                "Material '" +
                material_db_name +
                "' does not exist in '" +
                qualified_module_name + "'").c_str());
        return nullptr;
    }
 
    m_material_defs.push_back(material_definition);
 
    mi::Sint32 result;
    mi::base::Handle<mi::neuraylib::IFunction_call> material_instance(
        material_definition->create_function_call(0, &result));
    check_success(result == 0);
 
    material_instance->retain();
    return material_instance.get();
}
 
// Compiles the given material instance in the given compilation modes.
mi::neuraylib::ICompiled_material *Material_compiler::compile_material_instance(
    mi::neuraylib::IFunction_call* material_instance,
    bool class_compilation)
{
    // convert to target type SID_MATERIAL
    mi::base::Handle<mi::neuraylib::IType_factory> tf(
        m_mdl_factory->create_type_factory(m_transaction.get()));
    mi::base::Handle<const mi::neuraylib::IType> standard_material_type(
        tf->get_predefined_struct(mi::neuraylib::IType_struct::SID_MATERIAL));
    m_context->set_option("target_type", standard_material_type.get());
 
 
    mi::Uint32 flags = class_compilation
        ? mi::neuraylib::IMaterial_instance::CLASS_COMPILATION
        : mi::neuraylib::IMaterial_instance::DEFAULT_OPTIONS;
    mi::base::Handle<const mi::neuraylib::IMaterial_instance> material_instance2(
        material_instance->get_interface<const mi::neuraylib::IMaterial_instance>());
    mi::base::Handle<mi::neuraylib::ICompiled_material> compiled_material(
        material_instance2->create_compiled_material(flags, m_context.get()));
    check_success(print_messages(m_context.get()));
 
    m_compiled_materials.push_back(compiled_material);
 
    compiled_material->retain();
    return compiled_material.get();
}
 
// Generates CUDA PTX target code for the current link unit.
mi::base::Handle<const mi::neuraylib::ITarget_code> Material_compiler::generate_cuda_ptx()
{
    mi::base::Handle<const mi::neuraylib::ITarget_code> code_cuda_ptx(
        m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
    check_success(print_messages(m_context.get()));
    check_success(code_cuda_ptx);
 
#ifdef DUMP_PTX
    FILE *file = fopen("target_code.ptx", "wt");
    if (file)
    {
        fwrite(code_cuda_ptx->get_code(), code_cuda_ptx->get_code_size(), 1, file);
        fclose(file);
    }
#endif
 
    return code_cuda_ptx;
}
 
// Add a subexpression of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_subexpr(
    const std::string& qualified_module_name,
    const std::string& material_db_name,
    const char* path,
    const char* fname,
    bool class_compilation)
{
    mi::neuraylib::Target_function_description desc;
    desc.path = path;
    desc.base_fname = fname;
    add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
    return desc.return_code == 0;
}
 
// Add a distribution function of a given material to the link unit.
// path is the path of the sub-expression.
// fname is the function name in the generated code.
bool Material_compiler::add_material_df(
    const std::string& qualified_module_name,
    const std::string& material_db_name,
    const char* path,
    const char* base_fname,
    bool class_compilation)
{
    mi::neuraylib::Target_function_description desc;
    desc.path = path;
    desc.base_fname = base_fname;
    add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
    return desc.return_code == 0;
}
 
// Add (multiple) MDL distribution function and expressions of a material to this link unit.
// For each distribution function it results in four functions, suffixed with \c "_init",
// \c "_sample", \c "_evaluate", and \c "_pdf". Functions can be selected by providing a
// a list of \c Target_function_description. Each of them needs to define the \c path, the root
// of the expression that should be translated. After calling this function, each element of
// the list will contain information for later usage in the application,
// e.g., the \c argument_block_index and the \c function_index.
bool Material_compiler::add_material(
    const std::string& qualified_module_name,
    const std::string& material_db_name,
    mi::neuraylib::Target_function_description* function_descriptions,
    mi::Size description_count,
    bool class_compilation)
{
    if (description_count == 0)
        return false;
 
    // Load the given module and create a material instance
    mi::base::Handle<mi::neuraylib::IFunction_call> material_instance(
        create_material_instance(qualified_module_name, material_db_name));
    if (!material_instance)
        return false;
 
    // Compile the material instance in instance compilation mode
    mi::base::Handle<mi::neuraylib::ICompiled_material> compiled_material(
        compile_material_instance(material_instance.get(), class_compilation));
 
    m_link_unit->add_material(
        compiled_material.get(), function_descriptions, description_count,
        m_context.get());
 
    // Note: the same argument_block_index is filled into all function descriptions of a
    //       material, if any function uses it
    m_arg_block_indexes.push_back(function_descriptions[0].argument_block_index);
 
    return print_messages(m_context.get());
}
 
 
//------------------------------------------------------------------------------
//
// Material execution code
//
//------------------------------------------------------------------------------
 
// Helper function to create PTX source code for a non-empty 32-bit value array.
void print_array_u32(
    std::string &str, std::string const &name, unsigned count, std::string const &content)
{
    str += ".visible .const .align 4 .u32 " + name + "[";
    if (count == 0) {
        // PTX does not allow empty arrays, so use a dummy entry
        str += "1] = { 0 };\n";
    } else {
        str += to_string(count) + "] = { " + content + " };\n";
    }
}
 
// Helper function to create PTX source code for a non-empty function pointer array.
void print_array_func(
    std::string &str, std::string const &name, unsigned count, std::string const &content)
{
    str += ".visible .const .align 8 .u64 " + name + "[";
    if (count == 0) {
        // PTX does not allow empty arrays, so use a dummy entry
        str += "1] = { dummy_func };\n";
    } else {
        str += to_string(count) + "] = { " + content + " };\n";
    }
}
 
// Generate PTX array containing the references to all generated functions.
std::string generate_func_array_ptx(
    const std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > &target_codes)
{
    // Create PTX header and mdl_expr_functions_count constant
    std::string src =
        ".version 4.0\n"
        ".target sm_20\n"
        ".address_size 64\n";
 
    // Workaround needed to let CUDA linker resolve the function pointers in the arrays.
    // Also used for "empty" function arrays.
    src += ".func dummy_func() { ret; }\n";
 
    std::string tc_offsets;
    std::string function_names;
    std::string tc_indices;
    std::string ab_indices;
    unsigned f_count = 0;
 
    // Iterate over all target codes
    for (size_t tc_index = 0, num = target_codes.size(); tc_index < num; ++tc_index)
    {
        mi::base::Handle<const mi::neuraylib::ITarget_code> const &target_code =
            target_codes[tc_index];
 
        // in case of multiple target codes, we need to address the functions by a pair of
        // target_code_index and function_index.
        // the elements in the resulting function array can then be index by offset + func_index.
        if(!tc_offsets.empty())
            tc_offsets += ", ";
        tc_offsets += to_string(f_count);
 
        // Collect all names and prototypes of callable functions within the current target code
        for (size_t func_index = 0, func_count = target_code->get_callable_function_count();
             func_index < func_count; ++func_index)
        {
            // add to function list
            if (!tc_indices.empty())
            {
                tc_indices += ", ";
                function_names += ", ";
                ab_indices += ", ";
            }
 
            // target code index in case of multiple link units
            tc_indices += to_string(tc_index);
 
            // name of the function
            function_names += target_code->get_callable_function(func_index);
 
            // Get argument block index and translate to 1 based list index (-> 0 = not-used)
            mi::Size ab_index = target_code->get_callable_function_argument_block_index(func_index);
            ab_indices += to_string(ab_index == mi::Size(~0) ? 0 : (ab_index + 1));
            f_count++;
 
            // Add prototype declaration
            src += target_code->get_callable_function_prototype(
                func_index, mi::neuraylib::ITarget_code::SL_PTX);
            src += '\n';
        }
    }
 
    // infos per target code (link unit)
    src += std::string(".visible .const .align 4 .u32 mdl_target_code_count = ")
        + to_string(target_codes.size()) + ";\n";
    print_array_u32(
        src, std::string("mdl_target_code_offsets"), unsigned(target_codes.size()), tc_offsets);
 
    // infos per function
    src += std::string(".visible .const .align 4 .u32 mdl_functions_count = ")
        + to_string(f_count) + ";\n";
    print_array_func(src, std::string("mdl_functions"), f_count, function_names);
    print_array_u32(src, std::string("mdl_arg_block_indices"), f_count, ab_indices);
    print_array_u32(src, std::string("mdl_target_code_indices"), f_count, tc_indices);
 
    return src;
}
 
// Build a linked CUDA kernel containing our kernel and all the generated code, making it
// available to the kernel via an added "mdl_expr_functions" array.
CUmodule build_linked_kernel(
    std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > const &target_codes,
    const char *ptx_file,
    const char *kernel_function_name,
    CUfunction *out_kernel_function)
{
    // Generate PTX array containing the references to all generated functions.
    // The linker will resolve them to addresses.
 
    std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
    FILE *file = fopen("func_array.ptx", "wt");
    if (file)
    {
        fwrite(ptx_func_array_src.c_str(), ptx_func_array_src.size(), 1, file);
        fclose(file);
    }
#endif
 
    // Link all generated code, our generated PTX array and our kernel together
 
    CUlinkState   cuda_link_state;
    CUmodule      cuda_module;
    void         *linked_cubin;
    size_t        linked_cubin_size;
    char          error_log[8192], info_log[8192];
    CUjit_option  options[4];
    void         *optionVals[4];
 
    mi::examples::profiling::Timing timing("PTX to SASS");
 
    // Setup the linker
 
    // Pass a buffer for info messages
    options[0] = CU_JIT_INFO_LOG_BUFFER;
    optionVals[0] = info_log;
    // Pass the size of the info buffer
    options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
    optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
    // Pass a buffer for error messages
    options[2] = CU_JIT_ERROR_LOG_BUFFER;
    optionVals[2] = error_log;
    // Pass the size of the error buffer
    options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
    optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
 
    check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
 
    CUresult link_result = CUDA_SUCCESS;
    do {
        // Add all code generated by the MDL PTX backend
        for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
            link_result = cuLinkAddData(
                cuda_link_state, CU_JIT_INPUT_PTX,
                const_cast<char *>(target_codes[i]->get_code()),
                target_codes[i]->get_code_size(),
                nullptr, 0, nullptr, nullptr);
            if (link_result != CUDA_SUCCESS) break;
        }
        if (link_result != CUDA_SUCCESS) break;
 
        // Add the "mdl_expr_functions" array PTX module
        link_result = cuLinkAddData(
            cuda_link_state, CU_JIT_INPUT_PTX,
            const_cast<char *>(ptx_func_array_src.c_str()),
            ptx_func_array_src.size(),
            nullptr, 0, nullptr, nullptr);
        if (link_result != CUDA_SUCCESS) break;
 
        // Add our kernel
        link_result = cuLinkAddFile(
            cuda_link_state, CU_JIT_INPUT_PTX,
            ptx_file, 0, nullptr, nullptr);
        if (link_result != CUDA_SUCCESS) break;
 
        // Link everything to a cubin
        link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
    } while (false);
    if (link_result != CUDA_SUCCESS) {
        std::cerr << "PTX linker error:\n" << error_log << std::endl;
        check_cuda_success(link_result);
    }
 
    timing.stop();
 
    std::cout << "CUDA link completed." << std::endl;
    if (info_log[0])
        std::cout << "Linker output:\n" << info_log << std::endl;
 
#ifdef DUMP_PTX
    file = fopen("target_code.cubin", "wb");
    if (file)
    {
        fwrite(linked_cubin, linked_cubin_size, 1, file);
        fclose(file);
    }
#endif
 
    // Load the result and get the entrypoint of our kernel
    check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
    check_cuda_success(cuModuleGetFunction(
        out_kernel_function, cuda_module, kernel_function_name));
 
    int regs = 0;
    check_cuda_success(
        cuFuncGetAttribute(&regs, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
    int lmem = 0;
    check_cuda_success(
        cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
    std::cout << "Kernel uses " << regs << " registers and " << lmem << " lmem and has a size of "
        << linked_cubin_size << " bytes." << std::endl;
 
    // Cleanup
    check_cuda_success(cuLinkDestroy(cuda_link_state));
 
    return cuda_module;
}
 
#endif // EXAMPLE_CUDA_SHARED_H
 

Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cu

/******************************************************************************
 * Copyright 2024 NVIDIA Corporation. All rights reserved.
 *****************************************************************************/
 
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#define _USE_MATH_DEFINES
#include <math.h>
 
#include "example_df_cuda.h"
#include "texture_support_cuda.h"
 
// To reuse this sample code for the MDL SDK and MDL Core the corresponding namespaces are used.
 
// when this CUDA code is used in the context of an SDK sample.
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
    #define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
    using namespace mi::neuraylib;
// when this CUDA code is used in the context of an Core sample.
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
    #define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
    using namespace mi::mdl;
#endif
 
// for LPE support there different options for the renderer, for CUDA a renderer provided buffer
// can be used to retrieve the contributions of the individual handles (named lobes)
// Note, a real renderer would go for one specific option only
#define DF_HSM_POINTER -2
#define DF_HSM_NONE    -1
#define DF_HSM_FIXED_1  1
#define DF_HSM_FIXED_2  2
#define DF_HSM_FIXED_4  4
#define DF_HSM_FIXED_8  8
// this is the one that is used, 
// Note, this has to match with code backend option "df_handle_slot_mode"
#define DF_HANDLE_SLOTS DF_HSM_POINTER
 
// If enabled, math::DX(state::texture_coordinates(0).xy) = float2(1, 0) and
// math::DY(state::texture_coordinates(0).xy) = float2(0, 1) will be used.
// #define USE_FAKE_DERIVATIVES
 
#ifdef ENABLE_DERIVATIVES
typedef Material_expr_function_with_derivs                  Mat_expr_func;
typedef Bsdf_init_function_with_derivs                      Bsdf_init_func;
typedef Bsdf_sample_function_with_derivs                    Bsdf_sample_func;
typedef Bsdf_evaluate_function_with_derivs                  Bsdf_evaluate_func;
typedef Bsdf_pdf_function_with_derivs                       Bsdf_pdf_func;
typedef Bsdf_auxiliary_function_with_derivs                 Bsdf_auxiliary_func;
typedef Edf_init_function_with_derivs                       Edf_init_func;
typedef Edf_sample_function_with_derivs                     Edf_sample_func;
typedef Edf_evaluate_function_with_derivs                   Edf_evaluate_func;
typedef Edf_pdf_function_with_derivs                        Edf_pdf_func;
typedef Edf_auxiliary_function_with_derivs                  Edf_auxiliary_func;
typedef Shading_state_material_with_derivs                  Mdl_state;
typedef Texture_handler_deriv                               Tex_handler;
#define TEX_VTABLE                                          tex_deriv_vtable
#else
typedef Material_expr_function                              Mat_expr_func;
typedef Bsdf_init_function                                  Bsdf_init_func;
typedef Bsdf_sample_function                                Bsdf_sample_func;
typedef Bsdf_evaluate_function                              Bsdf_evaluate_func;
typedef Bsdf_pdf_function                                   Bsdf_pdf_func;
typedef Bsdf_auxiliary_function                             Bsdf_auxiliary_func;
typedef Edf_init_function                                   Edf_init_func;
typedef Edf_sample_function                                 Edf_sample_func;
typedef Edf_evaluate_function                               Edf_evaluate_func;
typedef Edf_pdf_function                                    Edf_pdf_func;
typedef Edf_auxiliary_function                              Edf_auxiliary_func;
typedef Shading_state_material                              Mdl_state;
typedef Texture_handler                                     Tex_handler;
#define TEX_VTABLE                                          tex_vtable
#endif
 
// Custom structure representing the resources used by the generated code of a target code object.
struct Target_code_data
{
    size_t       num_textures;      // number of elements in the textures field
    Texture      *textures;         // a list of Texture objects, if used
 
    size_t       num_mbsdfs;        // number of elements in the mbsdfs field
    Mbsdf        *mbsdfs;           // a list of mbsdfs objects, if used
 
    size_t       num_lightprofiles; // number of elements in the lightprofiles field
    Lightprofile *lightprofiles;    // a list of lightprofiles objects, if used
 
    char const   *ro_data_segment;  // the read-only data segment, if used
};
 
 
// all function types
union Mdl_function_ptr
{
    Mat_expr_func           *expression;
    Bsdf_init_func          *bsdf_init;
    Bsdf_sample_func        *bsdf_sample;
    Bsdf_evaluate_func      *bsdf_evaluate;
    Bsdf_pdf_func           *bsdf_pdf;
    Bsdf_auxiliary_func     *bsdf_auxiliary;
    Edf_init_func           *edf_init;
    Edf_sample_func         *edf_sample;
    Edf_evaluate_func       *edf_evaluate;
    Edf_pdf_func            *edf_pdf;
    Edf_auxiliary_func      *edf_auxiliary;
};
 
// function index offset depending on the target code
extern __constant__ unsigned int     mdl_target_code_offsets[];
 
// number of generated functions
extern __constant__ unsigned int     mdl_functions_count;
 
// the following arrays are indexed by an mdl_function_index
extern __constant__ Mdl_function_ptr mdl_functions[];
extern __constant__ unsigned int     mdl_arg_block_indices[];
 
// Identity matrix.
// The last row is always implied to be (0, 0, 0, 1).
__constant__ const float4 identity[3] = {
    {1.0f, 0.0f, 0.0f, 0.0f},
    {0.0f, 1.0f, 0.0f, 0.0f},
    {0.0f, 0.0f, 1.0f, 0.0f}
};
 
// the material provides pairs for each generated function to evaluate
// the functions and arg blocks array are indexed by:
// mdl_target_code_offsets[target_code_index] + function_index
typedef uint3 Mdl_function_index;
__device__ inline Mdl_function_index get_mdl_function_index(const uint2& index_pair)
{
    return make_uint3(
        index_pair.x,   // target_code_index
        index_pair.y,   // function_index inside target code
        mdl_target_code_offsets[index_pair.x] + index_pair.y); // global function index
}
 
// resource handler for accessing textures and other data
// depends on the target code (link unit)
struct Mdl_resource_handler
{
    __device__ Mdl_resource_handler()
    {
        m_tex_handler.vtable = &;TEX_VTABLE;   // only required in 'vtable' mode, otherwise NULL
        data.shared_data = NULL;
        data.texture_handler = reinterpret_cast<Texture_handler_base *>(&m_tex_handler);
    }
 
    // reuse the handler with a different target code index
    __device__ inline void set_target_code_index(
        const Kernel_params& params, const Mdl_function_index& index)
    {
        m_tex_handler.num_textures = params.tc_data[index.x].num_textures;
        m_tex_handler.textures = params.tc_data[index.x].textures;
        m_tex_handler.num_mbsdfs = params.tc_data[index.x].num_mbsdfs;
        m_tex_handler.mbsdfs = params.tc_data[index.x].mbsdfs;
        m_tex_handler.num_lightprofiles = params.tc_data[index.x].num_lightprofiles;
        m_tex_handler.lightprofiles = params.tc_data[index.x].lightprofiles;
    }
 
    // a pointer to this data is passed to all generated functions
    Resource_data data;
 
private:
    Tex_handler m_tex_handler;
};
 
 
// checks if the indexed function can be evaluated or not
__device__ inline bool is_valid(const Mdl_function_index& index)
{
    return index.y != 0xFFFFFFFFu;
}
 
// get a pointer to the material parameters which is passed to all generated functions
__device__ inline const char* get_arg_block(
    const Kernel_params& params,
    const Mdl_function_index& index)
{
    return params.arg_block_list[mdl_arg_block_indices[index.z]];
}
 
// Init function
__device__ inline Bsdf_init_func* as_init(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 0].bsdf_init;
}
 
// Expression functions
__device__ inline Mat_expr_func* as_expression(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 0].expression;
}
 
// Expression functions
#ifdef ENABLE_DERIVATIVES
template <typename T>
__device__ inline typename Material_function<T>::Type_with_derivs *as_expression_typed(
    const Mdl_function_index& index)
{
    return reinterpret_cast<typename Material_function<T>::Type_with_derivs *>(
        mdl_functions[index.z + 0].expression);
}
#else
template <typename T>
__device__ inline typename Material_function<T>::Type *as_expression_typed(
    const Mdl_function_index& index)
{
    return reinterpret_cast<typename Material_function<T>::Type *>(
        mdl_functions[index.z + 0].expression);
}
#endif
 
// BSDF functions
__device__ inline Bsdf_sample_func* as_bsdf_sample(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 0].bsdf_sample;
}
 
__device__ inline Bsdf_evaluate_func* as_bsdf_evaluate(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 1].bsdf_evaluate;
}
 
__device__ inline Bsdf_pdf_func* as_bsdf_pdf(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 2].bsdf_pdf;
}
 
__device__ inline Bsdf_auxiliary_func* as_bsdf_auxiliary(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 3].bsdf_auxiliary;
}
 
// EDF functions
__device__ inline Edf_sample_func* as_edf_sample(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 0].edf_sample;
}
 
__device__ inline Edf_evaluate_func* as_edf_evaluate(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 1].edf_evaluate;
}
 
__device__ inline Edf_pdf_func* as_edf_pdf(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 2].edf_pdf;
}
 
__device__ inline Edf_auxiliary_func* as_edf_auxiliary(const Mdl_function_index& index)
{
    return mdl_functions[index.z + 3].edf_auxiliary;
}
 
 
// 3d vector math utilities
__device__ inline float3 operator-(const float3& a)
{
    return make_float3(-a.x, -a.y, -a.z);
}
__device__ inline float3 operator+(const float3& a, const float3& b)
{
    return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
__device__ inline float3 operator-(const float3& a, const float3& b)
{
    return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
__device__ inline float3 operator*(const float3& a, const float3& b)
{
    return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
__device__ inline float3 operator*(const float3& a, const float s)
{
    return make_float3(a.x * s, a.y * s, a.z * s);
}
__device__ inline float3 operator/(const float3& a, const float s)
{
    return make_float3(a.x / s, a.y / s, a.z / s);
}
__device__ inline void operator+=(float3& a, const float3& b)
{
    a.x += b.x; a.y += b.y; a.z += b.z;
}
__device__ inline void operator-=(float3& a, const float3& b)
{
    a.x -= b.x; a.y -= b.y; a.z -= b.z;
}
__device__ inline void operator*=(float3& a, const float3& b)
{
    a.x *= b.x; a.y *= b.y; a.z *= b.z;
}
__device__ inline void operator*=(float3& a, const float& s)
{
    a.x *= s; a.y *= s; a.z *= s;
}
__device__ inline float squared_length(const float3 &d)
{
    return d.x * d.x + d.y * d.y + d.z * d.z;
}
__device__ inline float3 normalize(const float3 &d)
{
    const float inv_len = 1.0f / sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
    return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
__device__ inline float dot(const float3 &u, const float3 &v)
{
    return u.x * v.x + u.y * v.y + u.z * v.z;
}
__device__ inline float3 cross(const float3 &u, const float3 &v)
{
    return make_float3(
        u.y * v.z - u.z * v.y,
        u.z * v.x - u.x * v.z,
        u.x * v.y - u.y * v.x);
}
 
// Random number generator based on the OptiX SDK
template<uint32_t N>
static __forceinline__ __device__ uint32_t tea(uint32_t v0, uint32_t v1)
{
    uint32_t s0 = 0;
 
    for (uint32_t n = 0; n < N; n++)
    {
        s0 += 0x9e3779b9;
        v0 += ((v1 << 4) + 0xa341316c) ^ (v1 + s0) ^ ((v1 >> 5) + 0xc8013ea4);
        v1 += ((v0 << 4) + 0xad90777d) ^ (v0 + s0) ^ ((v0 >> 5) + 0x7e95761e);
    }
 
    return v0;
}
 
// Generate random uint32_t in [0, 2^24)
static __forceinline__ __device__ uint32_t lcg(uint32_t& prev)
{
    const uint32_t LCG_A = 1664525u;
    const uint32_t LCG_C = 1013904223u;
    prev = (LCG_A * prev + LCG_C);
    return prev & 0x00FFFFFF;
}
 
// Generate random float in [0, 1)
static __forceinline__ __device__ float rnd(uint32_t& prev)
{
    return ((float)lcg(prev) / (float)0x01000000);
}
 
// direction to environment map texture coordinates
__device__ inline float2 environment_coords(const float3 &dir, const Kernel_params& params)
{
    const float u = atan2f(dir.z, dir.x) * (float)(0.5 / M_PI) + 0.5f;
    const float v = acosf(fmax(fminf(-dir.y, 1.0f), -1.0f)) * (float)(1.0 / M_PI);
    return make_float2(fmodf(u + params.env_rotation * (float)(0.5 * M_1_PI), 1.0f), v);
}
 
// importance sample the environment
__device__ inline float3 environment_sample(
    float3 &dir,
    float  &pdf,
    const  float3 &xi,
    const  Kernel_params &params)
{
    // importance sample an envmap pixel using an alias map
    const unsigned int size = params.env_size.x * params.env_size.y;
    const unsigned int idx = min((unsigned int)(xi.x * (float)size), size - 1);
    unsigned int env_idx;
    float xi_y = xi.y;
    if (xi_y < params.env_accel[idx].q) {
        env_idx = idx ;
        xi_y /= params.env_accel[idx].q;
    } else {
        env_idx = params.env_accel[idx].alias;
        xi_y = (xi_y - params.env_accel[idx].q) / (1.0f - params.env_accel[idx].q);
    }
 
    const unsigned int py = env_idx / params.env_size.x;
    const unsigned int px = env_idx % params.env_size.x;
    pdf = params.env_accel[env_idx].pdf;
 
    // uniformly sample spherical area of pixel
    const float u = (float)(px + xi_y) / (float)params.env_size.x;
    const float phi = u * (float)(2.0 * M_PI) - (float)M_PI - params.env_rotation;
    float sin_phi, cos_phi;
    sincosf(phi > float(-M_PI) ? phi : (phi + (float)(2.0 * M_PI)), &sin_phi, &cos_phi);
    const float step_theta = (float)M_PI / (float)params.env_size.y;
    const float theta0 = (float)(py) * step_theta;
    const float cos_theta = cosf(theta0) * (1.0f - xi.z) + cosf(theta0 + step_theta) * xi.z;
    const float theta = acosf(cos_theta);
    const float sin_theta = sinf(theta);
    dir = make_float3(cos_phi * sin_theta, -cos_theta, sin_phi * sin_theta);
 
    // lookup filtered beauty
    const float v = theta * (float)(1.0 / M_PI);
    const float4 t = tex2D<float4>(params.env_tex, u, v);
    return make_float3(t.x, t.y, t.z) * params.env_intensity / pdf;
}
 
// evaluate the environment
__device__ inline float3 environment_eval(
    float &pdf,
    const float3 &dir,
    const Kernel_params &params)
{
    const float2 uv = environment_coords(dir, params);
    const unsigned int x =
        min((unsigned int)(uv.x * (float)params.env_size.x), params.env_size.x - 1);
    const unsigned int y =
        min((unsigned int)(uv.y * (float)params.env_size.y), params.env_size.y - 1);
 
    pdf = params.env_accel[y * params.env_size.x + x].pdf;
    const float4 t = tex2D<float4>(params.env_tex, uv.x, uv.y) ;
    return make_float3(t.x, t.y, t.z) * params.env_intensity;
}
 
//-------------------------------------------------------------------------------------------------
 
struct auxiliary_data
{
    float3 albedo;
    float3 normal;
    int num; // multiple elements can contribute to the aux buffer with equal weight 
 
    __device__ inline auxiliary_data& operator+=(const auxiliary_data& b)
    {
        albedo += b.albedo;
        normal += b.normal;
        num += b.num;
        return *this;
    }
};
 
__device__ inline static void clear(auxiliary_data& data)
{
    data.albedo = make_float3(0.0f, 0.0f, 0.0f);
    data.normal = make_float3(0.0f, 0.0f, 0.0f);
    data.num = 0;
}
 
__device__ inline void normalize(auxiliary_data& data)
{
    data.albedo = data.albedo / fmaxf(1.0f, float(data.num));
 
    if (dot(data.normal, data.normal) > 0.0f)
        data.normal = normalize(data.normal);
 
    data.num = min(1, data.num);
}
 
//-------------------------------------------------------------------------------------------------
 
struct Ray_state
{
    float3 contribution;
    float3 weight;
    float3 pos, pos_rx, pos_ry;
    float3 dir, dir_rx, dir_ry;
    bool inside;
    bool inside_cutout;
    int intersection;
    uint32_t lpe_current_state;
    auxiliary_data* aux;
};
 
 
struct Ray_hit_info
{
    float distance;
    #ifdef ENABLE_DERIVATIVES
        tct_deriv_float3 position;
    #else
        float3 position;
    #endif
    float3 normal;
    float3 tangent_u;
    float3 tangent_v;
    #ifdef ENABLE_DERIVATIVES
        tct_deriv_float3 texture_coords[1];
    #else
        tct_float3 texture_coords[1];
    #endif
};
 
 
#define GT_SPHERE_RADIUS 1.0f
__device__ inline bool intersect_sphere(
    const Ray_state &ray_state,
    const Kernel_params &params,
    Ray_hit_info& out_hit)
{
    const float r = GT_SPHERE_RADIUS;
    const float b = 2.0f * dot(ray_state.dir, ray_state.pos);
    const float c = dot(ray_state.pos, ray_state.pos) - r * r;
 
    float tmp = b * b - 4.0f * c;
    if (tmp < 0.0f)
        return false;
 
    tmp = sqrtf(tmp);
    const float t0 = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
    const float t1 = c / t0;
 
    const float m = fminf(t0, t1);
    out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
    if (out_hit.distance < 0.0f)
        return false;
 
    // compute geometry state
    #ifdef ENABLE_DERIVATIVES
        out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
        out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
        out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
        const float3 &posval = out_hit.position.val;
    #else
        out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
        const float3 &posval = out_hit.position;
    #endif
    out_hit.normal = normalize(posval);
 
    const float phi = atan2f(out_hit.normal.x, out_hit.normal.z);
    const float theta = acosf(out_hit.normal.y);
 
    const float3 uvw = make_float3(
        (phi * (float) (0.5 / M_PI) + 0.5f) * 2.0f,
        1.0f - theta * (float) (1.0 / M_PI),
        0.0f);
 
    // compute surface derivatives
    float sp, cp;
    sincosf(phi, &sp, &cp);
    const float st = sinf(theta);
    out_hit.tangent_u = make_float3(cp * st, 0.0f, -sp * st) * (float) M_PI * r;
    out_hit.tangent_v = make_float3(sp * out_hit.normal.y, -st, cp * out_hit.normal.y) * (float) (-M_PI) * r;
 
    #ifdef ENABLE_DERIVATIVES
        out_hit.texture_coords[0].val = uvw;
        out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
        out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
    #else
        out_hit.texture_coords[0] = uvw;
    #endif
 
    return true;
}
 
#define GT_HAIR_RADIUS 0.35f
#define GT_HAIR_LENGTH 3.0f
__device__ inline bool intersect_hair(
    const Ray_state &ray_state,
    const Kernel_params &params,
    Ray_hit_info& out_hit)
{
    const float r = GT_HAIR_RADIUS;
    const float a = ray_state.dir.x * ray_state.dir.x + ray_state.dir.z * ray_state.dir.z;
    const float b = 2.0f * (ray_state.dir.x * ray_state.pos.x + ray_state.dir.z * ray_state.pos.z);
    const float c = ray_state.pos.x * ray_state.pos.x + ray_state.pos.z * ray_state.pos.z - r * r;
 
    float tmp = b * b - 4.0f * a * c;
    if (tmp < 0.0f)
        return false;
 
    tmp = sqrtf(tmp);
    const float q = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
    const float t0 = q / a;
    const float t1 = c / q;
 
    const float m = fminf(t0, t1);
    out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
    if (out_hit.distance < 0.0f)
        return false;
 
    // compute geometry state
    #ifdef ENABLE_DERIVATIVES
        out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
        out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
        out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
        const float3 &posval = out_hit.position.val;
    #else
        out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
        const float3 &posval = out_hit.position;
    #endif
    out_hit.normal = normalize(make_float3(posval.x, 0.0f, posval.z));
 
    if (fabsf(posval.y) > GT_HAIR_LENGTH * 0.5f)
        return false;
 
    const float phi = atan2f(posval.z, posval.x);
 
    const float3 uvw = make_float3(
        (posval.y + GT_HAIR_LENGTH * 0.5f) / GT_HAIR_LENGTH, // position along the hair
        phi * (float) (0.5f / M_PI) + 0.5f, // position around the hair in the range [0, 1]
        2.0f * GT_HAIR_RADIUS); // thickness of the hair
 
    // compute surface derivatives
    out_hit.tangent_u = make_float3(0.0, 1.0, 0.0);
    out_hit.tangent_v = cross(out_hit.normal, out_hit.tangent_u);
 
    #ifdef ENABLE_DERIVATIVES
        out_hit.texture_coords[0].val = uvw;
        out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
        out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
    #else
        out_hit.texture_coords[0] = uvw;
    #endif
 
    return true;
}
 
__device__ inline bool intersect_geometry(
    Ray_state &ray_state,
    const Kernel_params &params,
    Ray_hit_info& out_hit)
{
    switch (Geometry_type(params.geometry))
    {
        case GT_SPHERE:
            if (!intersect_sphere(ray_state, params, out_hit))
                return false;
            break;
        case GT_HAIR:
            if (!intersect_hair(ray_state, params, out_hit))
                return false;
            break;
        default:
            return false;
    }
 
    #ifndef ENABLE_DERIVATIVES
    ray_state.pos = out_hit.position;
    #else
    ray_state.pos = out_hit.position.val;
    if (params.use_derivatives && ray_state.intersection == 0)
    {
#ifdef USE_FAKE_DERIVATIVES
        out_hit.position.dx = make_float3(1.0f, 0.0f, 0.0f);
        out_hit.position.dy = make_float3(0.0f, 1.0f, 0.0f);
        out_hit.texture_coords[0].dx = make_float3(1.0f, 0.0f, 0.0f);
        out_hit.texture_coords[0].dy = make_float3(0.0f, 1.0f, 0.0f);
#else
        // compute ray differential for one-pixel offset rays
        // ("Physically Based Rendering", 3rd edition, chapter 10.1.1)
        const float d = dot(out_hit.normal, ray_state.pos);
        const float tx = (d - dot(out_hit.normal, ray_state.pos_rx)) / dot(out_hit.normal, ray_state.dir_rx);
        const float ty = (d - dot(out_hit.normal, ray_state.pos_ry)) / dot(out_hit.normal, ray_state.dir_ry);
        ray_state.pos_rx += ray_state.dir_rx * tx;
        ray_state.pos_ry += ray_state.dir_ry * ty;
 
        out_hit.position.dx = ray_state.pos_rx - ray_state.pos;
        out_hit.position.dy = ray_state.pos_ry - ray_state.pos;
 
        float4 A;
        float2 B_x, B_y;
        if (fabsf(out_hit.normal.x) > fabsf(out_hit.normal.y) && fabsf(out_hit.normal.x) > fabsf(out_hit.normal.z))
        {
            B_x = make_float2(
                ray_state.pos_rx.y - ray_state.pos.y,
                ray_state.pos_rx.z - ray_state.pos.z);
            B_y = make_float2(
                ray_state.pos_ry.y - ray_state.pos.y,
                ray_state.pos_ry.z - ray_state.pos.z);
            A = make_float4(
                out_hit.tangent_u.y, out_hit.tangent_u.z, out_hit.tangent_v.y, out_hit.tangent_v.z);
        }
        else if (fabsf(out_hit.normal.y) > fabsf(out_hit.normal.z))
        {
            B_x = make_float2(
                ray_state.pos_rx.x - ray_state.pos.x,
                ray_state.pos_rx.z - ray_state.pos.z);
            B_y = make_float2(
                ray_state.pos_ry.x - ray_state.pos.x,
                ray_state.pos_ry.z - ray_state.pos.z);
            A = make_float4(
                out_hit.tangent_u.x, out_hit.tangent_u.z, out_hit.tangent_v.x, out_hit.tangent_v.z);
        }
        else
        {
            B_x = make_float2(
                ray_state.pos_rx.x - ray_state.pos.x,
                ray_state.pos_rx.y - ray_state.pos.y);
            B_y = make_float2(
                ray_state.pos_ry.x - ray_state.pos.x,
                ray_state.pos_ry.y - ray_state.pos.y);
            A = make_float4(
                out_hit.tangent_u.x, out_hit.tangent_u.y, out_hit.tangent_v.x, out_hit.tangent_v.y);
        }
 
        const float det = A.x * A.w - A.y * A.z;
        if (fabsf(det) > 1e-10f)
        {
            const float inv_det = 1.0f / det;
 
            out_hit.texture_coords[0].dx.x = inv_det * (A.w * B_x.x - A.z * B_x.y);
            out_hit.texture_coords[0].dx.y = inv_det * (A.x * B_x.y - A.y * B_x.x);
 
            out_hit.texture_coords[0].dy.x = inv_det * (A.w * B_y.x - A.z * B_y.y);
            out_hit.texture_coords[0].dy.y = inv_det * (A.x * B_y.y - A.y * B_y.x);
        }
#endif
    }
    #endif
 
    out_hit.tangent_u = normalize(out_hit.tangent_u);
    out_hit.tangent_v = normalize(out_hit.tangent_v);
    return true;
}
 
 
__device__ bool cull_point_light(
    const Kernel_params &params,
    const float3 &light_position,
    const float3 &light_direction /*to light*/,
    const float3 &normal)
{
    switch (params.geometry)
    {
        case GT_SPHERE:
        {
            // same as default, but allow lights inside the sphere
            const float inside = (squared_length(light_position) < GT_SPHERE_RADIUS) ? -1.f : 1.f;
            return (dot(light_direction, normal) * inside) <= 0.0f;
        }
        case GT_HAIR:
            // ignore light sources within the volume
            return (light_position.x * light_position.x +
                    light_position.z * light_position.z) < GT_SPHERE_RADIUS;
        default:
            // ignore light from behind
            return dot(light_direction, normal) <= 0.0f;
    }
}
 
__device__ bool cull_env_light(
    const Kernel_params &params,
    const float3 &light_direction /*to light*/,
    const float3 &normal)
{
    switch (params.geometry)
    {
        case GT_HAIR:
            // allow light from behind
            return false;
 
        case GT_SPHERE:
        default:
            // ignore light from behind
            return dot(light_direction, normal) <= 0.0f;
    }
}
 
 
__device__ void continue_ray(
    Ray_state& ray_state,
    const Ray_hit_info &hit_indo,
    unsigned int event_type,
    const Kernel_params &params)
{
    switch (params.geometry)
    {
 
    case GT_HAIR:
        if (event_type == BSDF_EVENT_GLOSSY_TRANSMISSION)
        {
            // conservative
            ray_state.pos += ray_state.dir * 2.0f * GT_HAIR_RADIUS;
            ray_state.inside = false;
        }
        break;
 
    default:
        return;
    }
}
 
//-------------------------------------------------------------------------------------------------
 
// events that are define a transition between states, along with tag IDs
enum Transition_type
{
    TRANSITION_CAMERA = 0,
    TRANSITION_LIGHT,
    TRANSITION_EMISSION,
    TRANSITION_SCATTER_DR,
    TRANSITION_SCATTER_DT,
    TRANSITION_SCATTER_GR,
    TRANSITION_SCATTER_GT,
    TRANSITION_SCATTER_SR,
    TRANSITION_SCATTER_ST,
 
    TRANSITION_COUNT,
};
 
// go to the next state, given the current state and a transition token.
__device__ inline uint32_t lpe_transition(
    uint32_t current_state,
    Transition_type event,
    uint32_t global_tag_id,
    const Kernel_params &params)
{
    if(current_state == static_cast<uint32_t>(-1))
        return static_cast<uint32_t>(-1);
 
    return params.lpe_state_table[
        current_state * params.lpe_num_transitions +
        static_cast<uint32_t>(TRANSITION_COUNT) * global_tag_id +
        static_cast<uint32_t>(event)];
}
 
// add direct contribution, e.g., for emission, direct light hits
__device__ inline void accumulate_contribution(
    Transition_type light_event,
    uint32_t light_global_tag_id,
    const float3& contrib,
    Ray_state &ray_state,
    const Kernel_params &params)
{
    // check if there is a valid transition to that light source
    uint32_t next_state = lpe_transition(
        ray_state.lpe_current_state, light_event, light_global_tag_id, params);
    if (next_state == static_cast<uint32_t>(-1)) return;
 
    // add contribution the when the reached state is a final state for the selected LPE
    // here we only have one LPE buffer, but more can be added easily by checking different LPEs
    if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
        ray_state.contribution += contrib;
}
 
// add contribution for next event estimations
__device__ inline void accumulate_next_event_contribution(
    Transition_type scatter_event, uint32_t material_global_tag_id,
    Transition_type light_event, uint32_t light_global_tag_id,
    const float3& contrib,
    Ray_state &ray_state,
    const Kernel_params &params)
{
    // transition following the scatter event
    uint32_t next_state = lpe_transition(
        ray_state.lpe_current_state, scatter_event, material_global_tag_id, params);
    if (next_state == static_cast<uint32_t>(-1)) return;
 
    // check if there is a valid transition to the light source
    next_state = lpe_transition(
        next_state, light_event, light_global_tag_id, params);
    if (next_state == static_cast<uint32_t>(-1)) return;
 
    // add contribution the when the reached state is a final state for the selected LPE
    // here we only have one LPE buffer, but more can be added easily by checking different LPEs
    if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
        ray_state.contribution += contrib;
}
 
// evaluate if certain shading point (pos) is visible to a light direction by casting a shadow ray.
__device__ inline bool trace_shadow(
    const float3 &pos,
    const float3 &to_light,
    uint32_t &seed,
    Ray_state &ray_state,
    const Kernel_params &params)
{
    Ray_state ray_shadow = ray_state;
    ray_shadow.pos = pos;
    ray_shadow.dir = normalize(to_light);
 
    Ray_hit_info shadow_hit;
    bool in_shadow = intersect_geometry(ray_shadow, params, shadow_hit);
    if (in_shadow)
    {
        float4 texture_results[16];
 
        // material of the current object
        Df_cuda_material material = params.material_buffer[params.current_material];
 
        Mdl_function_index func_idx;
        func_idx = get_mdl_function_index(material.init);
        if (!is_valid(func_idx))
            return false;
 
        // create state
        Mdl_state state = {
            shadow_hit.normal,
            shadow_hit.normal,
            shadow_hit.position,
            0.0f,
            shadow_hit.texture_coords,
            &shadow_hit.tangent_u,
            &shadow_hit.tangent_v,
            texture_results,
            params.tc_data[func_idx.x].ro_data_segment,
            identity,
            identity,
            0,
            1.0f
        };
 
        // access textures and other resource data
        // expect that the target code index is the same for all functions of a material
        Mdl_resource_handler mdl_resources;
        mdl_resources.set_target_code_index(params, func_idx);    // init resource handler
 
        const char* arg_block = get_arg_block(params, func_idx);  // get material parameters
 
        // initialize the state
        as_init(func_idx)(&state, &mdl_resources.data, arg_block);
 
        // handle cutouts be treating the opacity as chance to hit the surface
        // if we don't hit it, the ray will continue with the same direction
        func_idx = get_mdl_function_index(material.cutout_opacity);
        if (is_valid(func_idx))
        {
            float opacity = as_expression_typed<float>(func_idx)(&state, &mdl_resources.data, arg_block);
            const float x_anyhit = rnd(seed);
            in_shadow = (x_anyhit <= opacity);
        }
    }
 
    return in_shadow;
}
 
//-------------------------------------------------------------------------------------------------
 
__device__ inline bool trace_scene(
    uint32_t &seed,
    Ray_state &ray_state,
    const Kernel_params &params)
{
    // stop at invalid states
    if (ray_state.lpe_current_state == static_cast<uint32_t>(-1))
        return false;
 
    // intersect with geometry
    Ray_hit_info hit;
    if (!intersect_geometry(ray_state, params, hit)) {
        if (ray_state.intersection == 0 && params.mdl_test_type != MDL_TEST_NO_ENV) {
            // primary ray miss, add environment contribution
            const float2 uv = environment_coords(ray_state.dir, params);
            const float4 texval = tex2D<float4>(params.env_tex, uv.x, uv.y);
 
            // add contribution, if `CL` is a valid path
            accumulate_contribution(
                TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
                make_float3(texval.x, texval.y, texval.z) * params.env_intensity,
                ray_state, params);
        }
        return false;
    }
 
    float4 texture_results[16];
 
    // material of the current object
    Df_cuda_material material = params.material_buffer[params.current_material];
 
    Mdl_function_index func_idx;
    func_idx = get_mdl_function_index(material.init);
    if (!is_valid(func_idx))
        return false;
 
    // create state
    Mdl_state state = {
        hit.normal,
        hit.normal,
        hit.position,
        0.0f,
        hit.texture_coords,
        &hit.tangent_u,
        &hit.tangent_v,
        texture_results,
        params.tc_data[func_idx.x].ro_data_segment,
        identity,
        identity,
        0,
        1.0f
    };
 
    // access textures and other resource data
    // expect that the target code index is the same for all functions of a material
    Mdl_resource_handler mdl_resources;
    mdl_resources.set_target_code_index(params, func_idx);    // init resource handler
 
    const char* arg_block = get_arg_block(params, func_idx);  // get material parameters
 
    // initialize the state
    as_init(func_idx)(&state, &mdl_resources.data, arg_block);
 
    // handle cutouts be treating the opacity as chance to hit the surface
    // if we don't hit it, the ray will continue with the same direction
    func_idx = get_mdl_function_index(material.cutout_opacity);
    if (is_valid(func_idx))
    {
        float opacity = as_expression_typed<float>(func_idx)(&state, &mdl_resources.data, arg_block);
        const float x_anyhit = rnd(seed);
        if (x_anyhit > opacity)
        {
            // decrease to see the environment though front and back face cutouts
            ray_state.intersection--;
 
            // change the side
            ray_state.inside_cutout = !ray_state.inside_cutout;
 
            // avoid self-intersections
            ray_state.pos += hit.normal * (ray_state.inside_cutout ? -0.001f : 0.001f);
            return true;
        }
    }
 
    // for evaluating parts of the BSDF individually, e.g. for implementing LPEs
    // the MDL SDK provides several options to pass out the BSDF, EDF, and auxiliary data
    #if DF_HANDLE_SLOTS == DF_HSM_POINTER
        // application provided memory
        // the data structs will get only a pointer to a buffer, along with size and offset
        const unsigned df_eval_slots = 4;       // number of handles (parts) that can be evaluated
                                                // at once. 4 is an arbitrary choice. However, it
                                                // has to match eval_data.handle_count and
                                                // aux_data.handle_count)
 
        float3 result_buffer_0[df_eval_slots];  // used for bsdf_diffuse, albedo_diffue, and edf
        float3 result_buffer_1[df_eval_slots];  // used for bsdf_glossy, albedo_glossy
        float3 result_buffer_2[df_eval_slots];  // used for normal
        float3 result_buffer_3[df_eval_slots];  // used for roughness
    #elif DF_HANDLE_SLOTS == DF_HSM_NONE
        // handles are ignored, all parts of the BSDF are returned at once without loops (fastest)
        const unsigned df_eval_slots = 1;
    #else
        // eval_data and auxiliary_data have a fixed size array to pass the data. Only an offset
        // is required if there are more handles (parts) than slots.
        const unsigned df_eval_slots = DF_HANDLE_SLOTS;
    #endif
 
    // apply volume attenuation after first bounce
    // (assuming uniform absorption coefficient and ignoring scattering coefficient)
    if (ray_state.intersection > 0)
    {
        func_idx = get_mdl_function_index(material.volume_absorption);
        if (is_valid(func_idx))
        {
            float3 abs_coeff = as_expression_typed<float3>(func_idx)(&state, &mdl_resources.data, arg_block);
 
            ray_state.weight.x *= abs_coeff.x > 0.0f ? expf(-abs_coeff.x * hit.distance) : 1.0f;
            ray_state.weight.y *= abs_coeff.y > 0.0f ? expf(-abs_coeff.y * hit.distance) : 1.0f;
            ray_state.weight.z *= abs_coeff.z > 0.0f ? expf(-abs_coeff.z * hit.distance) : 1.0f;
        }
    }
 
    // for thin_walled materials there is no 'inside'
    bool thin_walled = false;
    Mdl_function_index thin_walled_func_idx = get_mdl_function_index(material.thin_walled);
    if (is_valid(thin_walled_func_idx))
        thin_walled = as_expression_typed<bool>(thin_walled_func_idx)(
            &state, &mdl_resources.data, arg_block);
 
    // add emission
    func_idx = get_mdl_function_index((thin_walled && ray_state.inside_cutout) ? material.backface_edf : material.edf);
    if (is_valid(func_idx))
    {
        // evaluate intensity expression
        float3 emission_intensity = make_float3(0.0, 0.0, 0.0);
        Mdl_function_index intensity_func_idx = get_mdl_function_index(
            (thin_walled && ray_state.inside_cutout) ? material.backface_emission_intensity : material.emission_intensity);
        if (is_valid(intensity_func_idx))
        {
            emission_intensity = as_expression_typed<float3>(intensity_func_idx)(
                &state, &mdl_resources.data, arg_block);
        }
 
        // evaluate EDF
        Edf_evaluate_data<(Df_handle_slot_mode) DF_HANDLE_SLOTS> eval_data;
        eval_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
 
        #if DF_HANDLE_SLOTS == DF_HSM_POINTER
            eval_data.edf = result_buffer_0;
            eval_data.handle_count = df_eval_slots;
        #endif
 
        // outer loop in case the are more material tags than slots in the evaluate struct
        unsigned offset = 0;
        unsigned edf_mtag_to_gtag_map_size = (thin_walled && ray_state.inside_cutout) ?
            material.backface_edf_mtag_to_gtag_map_size : material.edf_mtag_to_gtag_map_size;
        unsigned* edf_mtag_to_gtag_map = (thin_walled && ray_state.inside_cutout) ?
            &material.backface_edf_mtag_to_gtag_map[0] : &material.edf_mtag_to_gtag_map[0];
 
        #if DF_HANDLE_SLOTS != DF_HSM_NONE
        for (; offset < edf_mtag_to_gtag_map_size; offset += df_eval_slots)
        {
            eval_data.handle_offset = offset;
        #endif
 
            // evaluate the materials EDF
            as_edf_evaluate(func_idx)(&eval_data, &state, &mdl_resources.data, arg_block);
 
            // iterate over all lobes (tags that appear in the df)
            for (unsigned lobe = 0; (lobe < df_eval_slots) &&
                ((offset + lobe) < edf_mtag_to_gtag_map_size); ++lobe)
            {
                // add emission contribution
                accumulate_contribution(
                    TRANSITION_EMISSION, edf_mtag_to_gtag_map[offset + lobe],
                    #if DF_HANDLE_SLOTS == DF_HSM_NONE
                        eval_data.edf * emission_intensity,
                    #else
                        eval_data.edf[lobe] * emission_intensity,
                    #endif
                    ray_state, params);
            }
 
        #if DF_HANDLE_SLOTS != DF_HSM_NONE
        }
        #endif
    }
 
    func_idx = get_mdl_function_index(
        (thin_walled && ray_state.inside_cutout) ? material.backface_bsdf : material.bsdf);
 
    unsigned bsdf_mtag_to_gtag_map_size = (thin_walled && ray_state.inside_cutout) ?
        material.backface_bsdf_mtag_to_gtag_map_size : material.bsdf_mtag_to_gtag_map_size;
    unsigned* bsdf_mtag_to_gtag_map = (thin_walled && ray_state.inside_cutout) ?
        &material.backface_bsdf_mtag_to_gtag_map[0] : &material.bsdf_mtag_to_gtag_map[0];
 
    if (is_valid(func_idx))
    {
        // reuse memory for function data
        union
        {
            Bsdf_sample_data                                            sample_data;
            Bsdf_evaluate_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS>    eval_data;
            Bsdf_pdf_data                                               pdf_data;
            Bsdf_auxiliary_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS>   aux_data;
        };
 
        // initialize shared fields
        if (ray_state.inside && !thin_walled)
        {
            sample_data.ior1.x = BSDF_USE_MATERIAL_IOR;
            sample_data.ior2 = make_float3(1.0f, 1.0f, 1.0f);
        }
        else
        {
            sample_data.ior1 = make_float3(1.0f, 1.0f, 1.0f);
            sample_data.ior2.x = BSDF_USE_MATERIAL_IOR;
        }
        sample_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
 
        // if requested, fill auxiliary buffers
        if (params.enable_auxiliary_output && ray_state.intersection == 0)
        {
            #if DF_HANDLE_SLOTS == DF_HSM_POINTER
                aux_data.albedo_diffuse = result_buffer_0;
                aux_data.albedo_glossy = result_buffer_1;
                aux_data.normal = result_buffer_2;
                aux_data.roughness = result_buffer_3;
                aux_data.handle_count = df_eval_slots;
            #endif
            aux_data.flags = params.bsdf_data_flags;
 
            // outer loop in case there are more material tags than slots in the evaluate struct
            unsigned offset = 0;
            #if DF_HANDLE_SLOTS != DF_HSM_NONE
            for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
            {
                aux_data.handle_offset = offset;
            #endif
 
                // evaluate the materials auxiliary
                as_bsdf_auxiliary(func_idx)(&aux_data, &state, &mdl_resources.data, arg_block);
 
                // iterate over all lobes (tags that appear in the df)
                for (unsigned lobe = 0; (lobe < df_eval_slots) &&
                    ((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
                {
                    // to keep it simpler, the individual albedo and normals are averaged
                    // however, the parts can also be used separately, e.g. for LPEs
                    #if DF_HANDLE_SLOTS == DF_HSM_NONE
                        ray_state.aux->albedo += aux_data.albedo_diffuse + aux_data.albedo_glossy;
                        ray_state.aux->normal += aux_data.normal;
                    #else
                        ray_state.aux->albedo += aux_data.albedo_diffuse[lobe] +
                                                 aux_data.albedo_glossy[lobe];
                        ray_state.aux->normal += aux_data.normal[lobe];
                    #endif
                    ray_state.aux->num++;
                }
 
            #if DF_HANDLE_SLOTS != DF_HSM_NONE
            }
            #endif
        }
 
        // compute direct lighting for point light
        Transition_type transition_glossy, transition_diffuse;
        if (params.light_intensity > 0.0f)
        {
            float3 to_light = params.light_pos - ray_state.pos;
            bool culled_light = cull_point_light(params, params.light_pos, to_light, ray_state.inside_cutout ? -hit.normal : hit.normal);
 
            // cast a shadow ray when inside a cutout
            if (ray_state.inside_cutout && !culled_light)
            {
                const float3 pos = ray_state.pos - hit.normal * 0.001f;
                culled_light = trace_shadow(pos, to_light, seed, ray_state, params);
            }
 
            if(!culled_light)
            {
                const float inv_squared_dist = 1.0f / squared_length(to_light);
                const float3 f = params.light_color * params.light_intensity *
                                 inv_squared_dist * (float) (0.25 / M_PI);
 
                eval_data.k2 = to_light * sqrtf(inv_squared_dist);
                #if DF_HANDLE_SLOTS == DF_HSM_POINTER
                    eval_data.bsdf_diffuse = result_buffer_0;
                    eval_data.bsdf_glossy = result_buffer_1;
                    eval_data.handle_count = df_eval_slots;
                #endif
                eval_data.flags = params.bsdf_data_flags;
 
                // outer loop in case the are more material tags than slots in the evaluate struct
                unsigned offset = 0;
                #if DF_HANDLE_SLOTS != DF_HSM_NONE
                for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
                {
                    eval_data.handle_offset = offset;
                #endif
 
                    // evaluate the materials BSDF
                    as_bsdf_evaluate(func_idx)(
                        &eval_data, &state, &mdl_resources.data, arg_block);
 
                    // we know if we reflect or transmit
                    if (dot(to_light, ray_state.inside_cutout ? -hit.normal : hit.normal) > 0.0f) {
                        transition_glossy = TRANSITION_SCATTER_GR;
                        transition_diffuse = TRANSITION_SCATTER_DR;
                    } else {
                        transition_glossy = TRANSITION_SCATTER_GT;
                        transition_diffuse = TRANSITION_SCATTER_DT;
                    }
 
                    // sample weight
                    const float3 w = ray_state.weight * f;
 
                    // iterate over all lobes (tags that appear in the df)
                    for (unsigned lobe = 0; (lobe < df_eval_slots) &&
                         ((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
                    {
                        // get the `global tag` of the lobe
                        unsigned material_lobe_gtag = bsdf_mtag_to_gtag_map[offset + lobe];
 
                        // add diffuse contribution
                        accumulate_next_event_contribution(
                            transition_diffuse, material_lobe_gtag,
                            TRANSITION_LIGHT, params.point_light_gtag, // light group
                            #if DF_HANDLE_SLOTS == DF_HSM_NONE
                                eval_data.bsdf_diffuse * w,
                            #else
                                eval_data.bsdf_diffuse[lobe] * w,
                            #endif
                            ray_state, params);
 
                        // add glossy contribution
                        accumulate_next_event_contribution(
                            transition_glossy, material_lobe_gtag,
                            TRANSITION_LIGHT, params.point_light_gtag, // light group
                            #if DF_HANDLE_SLOTS == DF_HSM_NONE
                                eval_data.bsdf_glossy * w,
                            #else
                                eval_data.bsdf_glossy[lobe] * w,
                            #endif
                            ray_state, params);
                    }
 
                #if DF_HANDLE_SLOTS != DF_HSM_NONE
                }
                #endif
            }
        }
 
        // importance sample environment light
        if (params.mdl_test_type != MDL_TEST_SAMPLE && params.mdl_test_type != MDL_TEST_NO_ENV)
        {
            const float xi0 = rnd(seed);
            const float xi1 = rnd(seed);
            const float xi2 = rnd(seed);
 
            float3 light_dir;
            float pdf;
            const float3 f = environment_sample(light_dir, pdf, make_float3(xi0, xi1, xi2), params);
            bool culled_env_light = pdf < 0.0f || cull_env_light(params, light_dir, ray_state.inside_cutout ? -hit.normal : hit.normal);
 
            // cast a shadow ray when inside a cutout
            if(ray_state.inside_cutout && !culled_env_light)
            {
                const float3 pos = ray_state.pos - hit.normal * 0.001f;
                culled_env_light = trace_shadow(pos, light_dir, seed, ray_state, params);
            }
 
            if (!culled_env_light)
            {
                eval_data.k2 = light_dir;
 
                #if DF_HANDLE_SLOTS == DF_HSM_POINTER
                    eval_data.bsdf_diffuse = result_buffer_0;
                    eval_data.bsdf_glossy = result_buffer_1;
                    eval_data.handle_count = df_eval_slots;
                #endif
                eval_data.flags = params.bsdf_data_flags;
 
                // outer loop in case the are more material tags than slots in the evaluate struct
                unsigned offset = 0;
                #if DF_HANDLE_SLOTS != DF_HSM_NONE
                for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
                {
                    eval_data.handle_offset = offset;
                #endif
 
                    // evaluate the materials BSDF
                    as_bsdf_evaluate(func_idx)(
                        &eval_data, &state, &mdl_resources.data, arg_block);
 
                    const float mis_weight =
                        (params.mdl_test_type == MDL_TEST_EVAL) ? 1.0f : pdf / (pdf + eval_data.pdf);
 
                    // we know if we reflect or transmit
                    if (dot(light_dir, ray_state.inside_cutout ? -hit.normal : hit.normal) > 0.0f) {
                        transition_glossy = TRANSITION_SCATTER_GR;
                        transition_diffuse = TRANSITION_SCATTER_DR;
                    } else {
                        transition_glossy = TRANSITION_SCATTER_GT;
                        transition_diffuse = TRANSITION_SCATTER_DT;
                    }
 
                    // sample weight
                    const float3 w = ray_state.weight * f * mis_weight;
 
                    // iterate over all lobes (tags that appear in the df)
                    for (unsigned lobe = 0; (lobe < df_eval_slots) &&
                        ((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
                    {
                        // get the `global tag` of the lobe
                        unsigned material_lobe_gtag = bsdf_mtag_to_gtag_map[offset + lobe];
 
                        // add diffuse contribution
                        accumulate_next_event_contribution(
                            transition_diffuse, material_lobe_gtag,
                            TRANSITION_LIGHT, params.env_gtag, // light group 'env'
                            #if DF_HANDLE_SLOTS == DF_HSM_NONE
                                eval_data.bsdf_diffuse * w,
                            #else
                                eval_data.bsdf_diffuse[lobe] * w,
                            #endif
                            ray_state, params);
 
                        // add glossy contribution
                        accumulate_next_event_contribution(
                            transition_glossy, material_lobe_gtag,
                            TRANSITION_LIGHT, params.env_gtag, // light group 'env'
                            #if DF_HANDLE_SLOTS == DF_HSM_NONE
                                eval_data.bsdf_glossy * w,
                            #else
                                eval_data.bsdf_glossy[lobe] * w,
                            #endif
                            ray_state, params);
                    }
 
                #if DF_HANDLE_SLOTS != DF_HSM_NONE
                }
                #endif
            }
        }
 
        // importance sample BSDF
        {
            sample_data.xi.x = rnd(seed);
            sample_data.xi.y = rnd(seed);
            sample_data.xi.z = rnd(seed);
            sample_data.xi.w = rnd(seed);
            sample_data.flags = params.bsdf_data_flags;
 
            // sample the materials BSDF
            as_bsdf_sample(func_idx)(&sample_data, &state, &mdl_resources.data, arg_block);
 
            if (sample_data.event_type == BSDF_EVENT_ABSORB)
                return false;
 
            ray_state.dir = sample_data.k2;
            ray_state.weight *= sample_data.bsdf_over_pdf;
 
 
            const bool transmission = (sample_data.event_type & BSDF_EVENT_TRANSMISSION) != 0;
            if (transmission)
                ray_state.inside = !ray_state.inside;
 
            const bool is_specular = (sample_data.event_type & BSDF_EVENT_SPECULAR) != 0;
 
            Transition_type next;
            if (is_specular)
                next = transmission ? TRANSITION_SCATTER_ST : TRANSITION_SCATTER_SR;
            else if ((sample_data.event_type & BSDF_EVENT_DIFFUSE) != 0)
                next = transmission ? TRANSITION_SCATTER_DT : TRANSITION_SCATTER_DR;
            else
                next = transmission ? TRANSITION_SCATTER_GT : TRANSITION_SCATTER_GR;
 
            // move ray to the next sampled state
            ray_state.lpe_current_state = lpe_transition(
                ray_state.lpe_current_state, next,
                #if DF_HANDLE_SLOTS == DF_HSM_NONE
                    // ill-defined case: the LPE machine expects tags but the renderer ignores them
                    // -> the resulting image of LPEs with tags is undefined in this case
                    params.default_gtag,
                #else
                    bsdf_mtag_to_gtag_map[sample_data.handle], // sampled lobe
                #endif
                params);
 
            // depending on the geometry, the ray might be displaced before continuing
            continue_ray(ray_state, hit, sample_data.event_type, params);
 
            if (ray_state.inside || ray_state.inside_cutout)
            {
                // avoid self-intersections
                ray_state.pos -= hit.normal * 0.001f;
 
                return true; // continue bouncing in sphere
            }
            else if (params.mdl_test_type != MDL_TEST_NO_ENV &&
                params.mdl_test_type != MDL_TEST_EVAL)
            {
                // leaving sphere, add contribution from environment hit
 
                float pdf;
                const float3 f = environment_eval(pdf, sample_data.k2, params);
 
                float bsdf_pdf;
                if (params.mdl_test_type == MDL_TEST_MIS_PDF)
                {
                    const float3 k2 = sample_data.k2;
                    pdf_data.k2 = k2;
                    pdf_data.flags = params.bsdf_data_flags;
 
                    // get pdf corresponding to the materials BSDF
                    as_bsdf_pdf(func_idx)(&pdf_data, &state, &mdl_resources.data, arg_block);
 
                    bsdf_pdf = pdf_data.pdf;
                }
                else
                    bsdf_pdf = sample_data.pdf;
 
                if (is_specular || bsdf_pdf > 0.0f)
                {
                    const float mis_weight = is_specular ||
                        (params.mdl_test_type == MDL_TEST_SAMPLE) ? 1.0f :
                            bsdf_pdf / (pdf + bsdf_pdf);
 
                    float3 specular_contrib = ray_state.weight * f * mis_weight;
                    accumulate_contribution(
                        TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
                        specular_contrib,
                        ray_state, params);
                }
            }
        }
    }
 
    return false;
}
 
struct render_result
{
    float3 beauty;
    auxiliary_data aux;
};
 
__device__ inline render_result render_scene(
    uint32_t &seed,
    const Kernel_params &params,
    const unsigned x,
    const unsigned y)
{
    const float inv_res_x = 1.0f / (float)params.resolution.x;
    const float inv_res_y = 1.0f / (float)params.resolution.y;
 
    const float dx = params.disable_aa ? 0.5f : rnd(seed);
    const float dy = params.disable_aa ? 0.5f : rnd(seed);
 
    const float2 screen_pos = make_float2(
        ((float)x + dx) * inv_res_x,
        ((float)y + dy) * inv_res_y);
 
    const float r    = (2.0f * screen_pos.x               - 1.0f);
    const float r_rx = (2.0f * (screen_pos.x + inv_res_x) - 1.0f);
    const float u    = (2.0f * screen_pos.y               - 1.0f);
    const float u_ry = (2.0f * (screen_pos.y + inv_res_y) - 1.0f);
    const float aspect = (float)params.resolution.y / (float)params.resolution.x;
 
    render_result res;
    clear(res.aux);
 
    Ray_state ray_state;
    ray_state.contribution = make_float3(0.0f, 0.0f, 0.0f);
    ray_state.weight = make_float3(1.0f, 1.0f, 1.0f);
    ray_state.pos = ray_state.pos_rx = ray_state.pos_ry = params.cam_pos;
    ray_state.dir = normalize(
        params.cam_dir * params.cam_focal + params.cam_right * r    + params.cam_up * aspect * u);
    ray_state.dir_rx = normalize(
        params.cam_dir * params.cam_focal + params.cam_right * r_rx + params.cam_up * aspect * u);
    ray_state.dir_ry = normalize(
        params.cam_dir * params.cam_focal + params.cam_right * r    + params.cam_up * aspect * u_ry);
    ray_state.inside = false;
    ray_state.inside_cutout = false;
    ray_state.lpe_current_state = 1; // already at the camera so state 0 to 1 is free as long as
                                     // there is only one camera
    ray_state.aux = &res.aux;
    const unsigned int max_inters = params.max_path_length - 1;
    for (ray_state.intersection = 0; ray_state.intersection < max_inters; ++ray_state.intersection)
    {
        if (!trace_scene(seed, ray_state, params))
            break;
    }
 
    res.beauty =
        isfinite(ray_state.contribution.x) &&
        isfinite(ray_state.contribution.y) &&
        isfinite(ray_state.contribution.z) ? ray_state.contribution : make_float3(0.0f, 0.0f, 0.0f);
    normalize(res.aux);
    return res;
}
 
 
// quantize + gamma
__device__ inline unsigned int float3_to_rgba8(float3 val)
{
    const unsigned int r = (unsigned int) (255.0f * powf(__saturatef(val.x), 1.0f / 2.2f));
    const unsigned int g = (unsigned int) (255.0f * powf(__saturatef(val.y), 1.0f / 2.2f));
    const unsigned int b = (unsigned int) (255.0f * powf(__saturatef(val.z), 1.0f / 2.2f));
    return 0xff000000 | (b << 16) | (g << 8) | r;
}
 
// exposure + simple Reinhard tonemapper + gamma
__device__ inline unsigned int display(float3 val, const float tonemap_scale)
{
    val *= tonemap_scale;
    const float burn_out = 0.1f;
    val.x *= (1.0f + val.x * burn_out) / (1.0f + val.x);
    val.y *= (1.0f + val.y * burn_out) / (1.0f + val.y);
    val.z *= (1.0f + val.z * burn_out) / (1.0f + val.z);
    return float3_to_rgba8(val);
}
 
 
// CUDA kernel rendering simple geometry with IBL
extern "C" __global__ void render_scene_kernel(
    const Kernel_params kernel_params)
{
    const unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
    const unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
    if (x >= kernel_params.resolution.x || y >= kernel_params.resolution.y)
        return;
 
    const unsigned int idx = y * kernel_params.resolution.x + x;
    uint32_t seed = tea<4>(idx, kernel_params.iteration_start);
 
    render_result res;
    float3 beauty = make_float3(0.0f, 0.0f, 0.0f);
    auxiliary_data aux;
    clear(aux);
    for (unsigned int s = 0; s < kernel_params.iteration_num; ++s)
    {
        res = render_scene(
            seed,
            kernel_params,
            x, y);
 
        beauty += res.beauty;
        aux += res.aux;
 
    }
    beauty *= 1.0f / (float)kernel_params.iteration_num;
    normalize(aux);
 
    // accumulate
    if (kernel_params.iteration_start == 0) {
        kernel_params.accum_buffer[idx] = beauty;
 
        if (kernel_params.enable_auxiliary_output) {
            kernel_params.albedo_buffer[idx] = aux.albedo;
            kernel_params.normal_buffer[idx] = aux.normal;
        }
    } else {
        float iteration_weight = (float) kernel_params.iteration_num /
            (float) (kernel_params.iteration_start + kernel_params.iteration_num);
 
        float3 buffer_val = kernel_params.accum_buffer[idx] +
            (beauty - kernel_params.accum_buffer[idx]) * iteration_weight;
 
        kernel_params.accum_buffer[idx] =
            (isinf(buffer_val.x) || isnan(buffer_val.y) || isinf(buffer_val.z) ||
             isnan(buffer_val.x) || isinf(buffer_val.y) || isnan(buffer_val.z))
                ? make_float3(0.0f, 0.0f, 1.0e+30f)
                : buffer_val;
 
        if (kernel_params.enable_auxiliary_output) {
 
            // albedo
            kernel_params.albedo_buffer[idx] = kernel_params.albedo_buffer[idx] +
                (aux.albedo - kernel_params.albedo_buffer[idx]) * iteration_weight;
 
            // normal, check for zero length first
            float3 weighted_normal = kernel_params.normal_buffer[idx] +
                (aux.normal - kernel_params.normal_buffer[idx]) * iteration_weight;
            if (dot(weighted_normal, weighted_normal) > 0.0f)
                weighted_normal = normalize(weighted_normal);
            kernel_params.normal_buffer[idx] = weighted_normal;
        }
    }
 
    // update display buffer
    if (kernel_params.display_buffer)
    {
        switch (kernel_params.enable_auxiliary_output ? kernel_params.display_buffer_index : 0)
        {
        case 1: /* albedo */
            kernel_params.display_buffer[idx] = float3_to_rgba8(kernel_params.albedo_buffer[idx]);
            break;
 
        case 2: /* normal */
        {
            float3 display_normal = kernel_params.normal_buffer[idx];
            if (dot(display_normal, display_normal) > 0) {
                display_normal.x = display_normal.x * 0.5f + 0.5f;
                display_normal.y = display_normal.y * 0.5f + 0.5f;
                display_normal.z = display_normal.z * 0.5f + 0.5f;
            }
            kernel_params.display_buffer[idx] = float3_to_rgba8(display_normal);
            break;
        }
        default: /* beauty */
            kernel_params.display_buffer[idx] =
                display(kernel_params.accum_buffer[idx], kernel_params.exposure_scale);
            break;
        }
    }
}

[Previous] [Up] [Next]