This example shows how to enable and use automatic derivatives for texture filtering.
New Topics
Detailed Description
- Automatic derivatives
When rendering images with high-frequency content, like a checker board texture, you usually have to shoot an increased number of rays to avoid aliasing effects. By providing information to the texture lookup functions about the area covered by a ray, the integral over this area can be calculated (or approximated) to get good texture sampling with just a single ray.
In the MDL SDK, the renderer can provide the derivatives of texture coordinates with respect to x and y in screen space, when calling generated code for MDL expressions or distribution functions. The generated code then calculates the derivatives of expressions provided to 2D texture lookup functions as texture coordinate parameters using "Automatic Differentiation" (see Dan Piponi, "Automatic Differentiation, C++ Templates, and Photogrammetry", Journal of Graphics, GPU, and Game Tools (JGT), Vol 9, No 4, pp. 41-55 (2004)). The renderer-provided texture runtime can finally use texture filtering techniques like elliptically weighted average (EWA) to determine the texture result.
Automatic derivatives are supported by the CUDA, LLVM IR and native backends of the MDL SDK. To enable this feature, you need to:
- set the backend option
"texture_runtime_with_derivs"
to "on"
via the IMdl_backend::set_option() method
- replace Shading_state_material by Shading_state_material_with_derivs and provide the derivatives of the texture coordinates
- provide a texture runtime supporting
"tex_lookup_deriv_float4_2d"
and "tex_lookup_deriv_float3_2d"
(for the native backend, there is a simple isotropic implementation using mipmaps, if you use the default "on"
value for the "use_builtin_resource_handler"
option)
In the "example_df_cuda"
example, the backend option is set in the constructor of the Material_compiler
class in example_cuda_shared.h
, when the "-d"
option is provided via command line:
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
In this mode, the example will use a variant of example_df_cuda.cu
compiled with a "ENABLE_DERIVATIVES"
define, which will enable calculating the texture coordinate derivatives on the sphere. It does so by intersecting two additional rays, offset by one-pixel in x and y direction, with the plane given by the primary intersection point and the corresponding normal. Using the surface derivatives with respect to U and V and the new intersection points, it determines the derivatives of the texture coordinates with respect to screen-space x and y. See Matt Pharr et al., "Physically Based Rendering", 3rd edition (2016), chapter 10.1.1 for details.
Instead of the usual Shading_state_material, the example then fills in the derivative variant Shading_state_material_with_derivs where the texture coordinates are tct_deriv_float3 values consisting of val
, the texture coordinate, dx
and dy
, the derivative of the texture coordinate with respect to screen-space x and screen-space y, respectively.
The types could be written as:
};
};
tct_traits<true>::tct_derivable_float3 tct_deriv_float3
A float3 with derivatives.
Definition: target_code_types.h:132
mi::Float32_3_struct tct_float3
A float3.
Definition: target_code_types.h:77
struct Shading_state_material_impl<true> Shading_state_material_with_derivs
The MDL material state structure with derivatives for the texture coordinates.
Definition: target_code_types.h:303
mi::Float32_4_struct tct_float4
A float4.
Definition: target_code_types.h:80
mi::Sint32 tct_int
An int.
Definition: target_code_types.h:83
float tct_float
A float.
Definition: target_code_types.h:71
tct_float3 normal
The result of state::normal().
Definition: target_code_types.h:217
char const * ro_data_segment
A pointer to a read-only data segment.
Definition: target_code_types.h:271
tct_int object_id
The result of state::object_id().
Definition: target_code_types.h:291
traits::tct_derivable_float3 position
The result of state::position().
Definition: target_code_types.h:225
tct_float4 const * world_to_object
A 4x4 transformation matrix in row-major order transforming from world to object coordinates.
Definition: target_code_types.h:278
tct_float3 const * tangent_v
An array containing the results of state::texture_tangent_v(i).
Definition: target_code_types.h:248
tct_float4 const * object_to_world
A 4x4 transformation matrix in row-major order transforming from object to world coordinates.
Definition: target_code_types.h:285
traits::tct_derivable_float3 const * text_coords
An array containing the results of state::texture_coordinate(i).
Definition: target_code_types.h:234
tct_float animation_time
The result of state::animation_time().
Definition: target_code_types.h:229
tct_float3 geom_normal
The result of state::geometry_normal().
Definition: target_code_types.h:221
tct_float3 const * tangent_u
An array containing the results of state::texture_tangent_u(i).
Definition: target_code_types.h:241
tct_float4 * text_results
The texture results lookup table.
Definition: target_code_types.h:256
- Note
- For the native backend, you have to cast the Shading_state_material_with_derivs object to a Shading_state_material reference when calling the
"execute_*"
methods of mi::neuraylib::ITarget_code, as these methods have not been duplicated for the derivative variants.
Additionally, a different vtable "tex_deriv_vtable"
defined in texture_support_cuda.h
is provided for the texture handler, which contains derivative variants for the 2D texture lookup functions: The functions "tex_lookup_deriv_float4_2d"
and "tex_lookup_deriv_float3_2d"
expect a "tct_deriv_float2 const *"
as "coord"
parameter:
};
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const crop_u[2],
float const crop_v[2]);
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const crop_u[2],
float const crop_v[2]);
tct_traits<true>::tct_derivable_float2 tct_deriv_float2
A float2 with derivatives.
Definition: target_code_types.h:129
mi::Float32_2_struct tct_float2
A float2.
Definition: target_code_types.h:74
Tex_wrap_mode
The texture wrap modes as defined by tex::wrap_mode in the MDL specification.
Definition: target_code_types.h:309
The example texture runtime uses the CUDA tex2DGrad()
functions to perform anisotropic hardware filtering. The derivatives provided to the texture lookup handlers can directly be used for those functions. The mipmaps are generated by Material_gpu_context::prepare_texture()
in example_cuda_shared.h
via the IImage_api::create_mipmaps() function.
Example Source
To compile the source code, you require CUDA, GLFW, and GLEW. For detailed instructions, please refer to the Getting Started section.
Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cpp
#include <chrono>
#include <iostream>
#include <string>
#include <vector>
#include <list>
#include <map>
#include <memory>
#define _USE_MATH_DEFINES
#include "example_df_cuda.h"
#include "lpe.h"
#define OPENGL_INTEROP
#include "example_cuda_shared.h"
#include <imgui.h>
#include <imgui_impl_glfw.h>
#include <imgui_impl_opengl3.h>
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#define GL_DISPLAY_CUDA
#include "utils/gl_display.h"
#include "utils/profiling.h"
using namespace mi::examples::profiling;
#define terminate() \
do { \
glfwTerminate(); \
exit_failure(); \
} while (0)
#define WINDOW_TITLE "MDL SDK DF CUDA Example"
inline float length(
const float3 &d)
{
return sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
}
inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f /
length(d);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
inline float3
operator/(
const float3& d,
float s)
{
const float inv_s = 1.0f / s;
return make_float3(d.x * inv_s, d.y * inv_s, d.z * inv_s);
}
static GLFWwindow *init_opengl(std::string& version_string, int res_x, int res_y)
{
check_success(glfwInit());
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
version_string = "#version 330 core";
GLFWwindow *window = glfwCreateWindow(
res_x, res_y, WINDOW_TITLE, nullptr, nullptr);
if (!window) {
std::cerr << "Error creating OpenGL window!" << std::endl;
terminate();
}
glfwMakeContextCurrent(window);
GLenum res = glewInit();
if (res != GLEW_OK) {
std::cerr << "GLEW error: " << glewGetErrorString(res) << std::endl;
terminate();
}
glfwSwapInterval(0);
check_success(glGetError() == GL_NO_ERROR);
return window;
}
struct Window_context
{
bool mouse_event, key_event;
bool save_image;
int zoom;
int mouse_button;
int mouse_button_action;
int mouse_wheel_delta;
bool moving;
double move_start_x, move_start_y;
double move_dx, move_dy;
int material_index_delta;
bool save_result;
bool exposure_event;
float exposure;
};
static std::string to_string(Display_buffer_options option)
{
switch (option)
{
case DISPLAY_BUFFER_LPE: return "Selected LPE";
case DISPLAY_BUFFER_ALBEDO: return "Albedo";
case DISPLAY_BUFFER_NORMAL: return "Normal";
default: return "";
}
}
{
switch (option)
{
case mi::neuraylib::DF_FLAGS_ALLOW_REFLECT: return "Reflect only";
case mi::neuraylib::DF_FLAGS_ALLOW_TRANSMIT: return "Transmit only";
case mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT: return "Reflect+Transmit";
default: return "";
}
}
static void handle_scroll(GLFWwindow *window, double xoffset, double yoffset)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (yoffset > 0.0) {
ctx->mouse_wheel_delta = 1; ctx->mouse_event = true;
} else if (yoffset < 0.0) {
ctx->mouse_wheel_delta = -1; ctx->mouse_event = true;
}
ImGui_ImplGlfw_ScrollCallback(window, xoffset, yoffset);
}
static void handle_key(GLFWwindow *window, int key, int scancode, int action, int mods)
{
if (action == GLFW_PRESS) {
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
switch (key) {
case GLFW_KEY_ESCAPE:
glfwSetWindowShouldClose(window, GLFW_TRUE);
break;
case GLFW_KEY_DOWN:
case GLFW_KEY_RIGHT:
case GLFW_KEY_PAGE_DOWN:
ctx->material_index_delta = 1;
ctx->key_event = true;
break;
case GLFW_KEY_UP:
case GLFW_KEY_LEFT:
case GLFW_KEY_PAGE_UP:
ctx->material_index_delta = -1;
ctx->key_event = true;
break;
case GLFW_KEY_ENTER:
ctx->save_result = true;
break;
case GLFW_KEY_KP_SUBTRACT:
ctx->exposure--;
ctx->exposure_event = true;
break;
case GLFW_KEY_KP_ADD:
ctx->exposure++;
ctx->exposure_event = true;
break;
default:
break;
}
}
ImGui_ImplGlfw_KeyCallback(window, key, scancode, action, mods);
}
static void handle_mouse_button(GLFWwindow *window, int button, int action, int mods)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
ctx->mouse_button = button + 1;
ctx->mouse_button_action = action;
ImGui_ImplGlfw_MouseButtonCallback(window, button, action, mods);
}
static void handle_mouse_pos(GLFWwindow *window, double xpos, double ypos)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->moving)
{
ctx->move_dx += xpos - ctx->move_start_x;
ctx->move_dy += ypos - ctx->move_start_y;
ctx->move_start_x = xpos;
ctx->move_start_y = ypos;
ctx->mouse_event = true;
}
}
static void resize_buffers(CUdeviceptr *buffer_cuda, int width, int height)
{
if (*buffer_cuda)
check_cuda_success(cuMemFree(*buffer_cuda));
if (width == 0 || height == 0)
*buffer_cuda = 0;
else
check_cuda_success(cuMemAlloc(buffer_cuda, width * height * sizeof(float3)));
}
static float build_alias_map(
const float *data,
const unsigned int size,
Env_accel *accel)
{
float sum = 0.0f;
for (unsigned int i = 0; i < size; ++i)
sum += data[i];
for (unsigned int i = 0; i < size; ++i)
accel[i].q = (static_cast<float>(size) * data[i] / sum);
unsigned int *partition_table = static_cast<unsigned int *>(
malloc(size * sizeof(unsigned int)));
unsigned int s = 0u, large = size;
for (unsigned int i = 0; i < size; ++i)
partition_table[(accel[i].q < 1.0f) ? (s++) : (--large)] = accel[i].alias = i;
for (s = 0; s < large && large < size; ++s)
{
const unsigned int j = partition_table[s], k = partition_table[large];
accel[j].alias = k;
accel[k].q += accel[j].q - 1.0f;
large = (accel[k].q < 1.0f) ? (large + 1u) : large;
}
free(partition_table);
return sum;
}
static void create_environment(
cudaTextureObject_t *env_tex,
cudaArray_t *env_tex_data,
CUdeviceptr *env_accel,
uint2 *res,
const char *envmap_name)
{
check_success(image->reset_file(envmap_name) == 0);
res->x = rx;
res->y = ry;
char const *image_type = image->get_type(0, 0);
if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0)
canvas = image_api->convert(canvas.get(), "Color");
const cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
check_cuda_success(cudaMallocArray(env_tex_data, &channel_desc, rx, ry));
const float *pixels = static_cast<const float *>(tile->get_data());
check_cuda_success(cudaMemcpy2DToArray(
*env_tex_data, 0, 0, pixels,
rx * sizeof(float4), rx * sizeof(float4), ry, cudaMemcpyHostToDevice));
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = *env_tex_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeWrap;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeWrap;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
check_cuda_success(cudaCreateTextureObject(env_tex, &res_desc, &tex_desc, nullptr));
Env_accel *env_accel_host = static_cast<Env_accel *>(malloc(rx * ry * sizeof(Env_accel)));
float *importance_data = static_cast<float *>(malloc(rx * ry * sizeof(float)));
float cos_theta0 = 1.0f;
const float step_phi = float(2.0 * M_PI) / float(rx);
const float step_theta = float(M_PI) / float(ry);
for (unsigned int y = 0; y < ry; ++y)
{
const float theta1 = float(y + 1) * step_theta;
const float cos_theta1 =
std::cos(theta1);
const float area = (cos_theta0 - cos_theta1) * step_phi;
cos_theta0 = cos_theta1;
for (unsigned int x = 0; x < rx; ++x) {
const unsigned int idx = y * rx + x;
const unsigned int idx4 = idx * 4;
importance_data[idx] =
area * std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2]));
}
}
const float inv_env_integral = 1.0f / build_alias_map(importance_data, rx * ry, env_accel_host);
free(importance_data);
for (unsigned int i = 0; i < rx * ry; ++i) {
const unsigned int idx4 = i * 4;
env_accel_host[i].pdf =
std::max(pixels[idx4], std::max(pixels[idx4 + 1], pixels[idx4 + 2])) * inv_env_integral;
}
*env_accel = gpu_mem_dup(env_accel_host, rx * ry * sizeof(Env_accel));
free(env_accel_host);
}
static void upload_lpe_state_machine(
Kernel_params& kernel_params,
LPE_state_machine& lpe_state_machine)
{
uint32_t num_trans = lpe_state_machine.get_transition_count();
uint32_t num_states = lpe_state_machine.get_state_count();
kernel_params.lpe_num_transitions = num_trans;
kernel_params.lpe_num_states = num_states;
if (kernel_params.lpe_state_table)
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
if (kernel_params.lpe_final_mask)
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
CUdeviceptr state_table = 0;
check_cuda_success(cuMemAlloc(&state_table, num_states * num_trans * sizeof(uint32_t)));
check_cuda_success(cuMemcpyHtoD(state_table, lpe_state_machine.get_state_table().data(),
num_states * num_trans * sizeof(uint32_t)));
kernel_params.lpe_state_table = reinterpret_cast<uint32_t*>(state_table);
CUdeviceptr final_mask = 0;
check_cuda_success(cuMemAlloc(&final_mask, num_states * sizeof(uint32_t)));
check_cuda_success(cuMemcpyHtoD(final_mask, lpe_state_machine.get_final_state_masks().data(),
num_states * sizeof(uint32_t)));
kernel_params.lpe_final_mask = reinterpret_cast<uint32_t*>(final_mask);
kernel_params.default_gtag = lpe_state_machine.handle_to_global_tag("");
kernel_params.point_light_gtag = lpe_state_machine.handle_to_global_tag("point_light");
kernel_params.env_gtag = lpe_state_machine.handle_to_global_tag("env");
}
static void save_result(
const CUdeviceptr cuda_buffer,
const unsigned int width,
const unsigned int height,
const std::string &filename,
{
image_api->create_canvas("Rgb_fp", width, height));
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(data, cuda_buffer, width * height * sizeof(float3)));
option_force_default_gamma->set_value(true);
export_options->insert("force_default_gamma", option_force_default_gamma.get());
mdl_impexp_api->export_canvas(filename.c_str(), canvas.get(), export_options.get());
}
struct Options {
int cuda_device;
float gui_scale;
bool opengl;
bool use_class_compilation;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter;
#endif
bool no_aa;
bool enable_derivatives;
bool fold_ternary_on_df;
bool enable_auxiliary_output;
bool enable_pdf;
bool use_adapt_normal;
unsigned int res_x, res_y;
unsigned int iterations;
unsigned int samples_per_iteration;
unsigned int mdl_test_type;
unsigned int max_path_length;
float fov;
float exposure;
float3 cam_pos;
float3 light_pos;
float3 light_intensity;
bool enable_bsdf_flags;
std::string hdrfile;
float hdr_rot;
std::string outputfile;
std::vector<std::string> material_names;
Options()
: cuda_device(0)
, gui_scale(1.0f)
, opengl(true)
, use_class_compilation(true)
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
, use_df_interpreter(false)
#endif
, no_aa(false)
, enable_derivatives(false)
, fold_ternary_on_df(false)
, enable_auxiliary_output(true)
, enable_pdf(true)
, use_adapt_normal(false)
, res_x(1024)
, res_y(1024)
, iterations(4096)
, samples_per_iteration(8)
, mdl_test_type(MDL_TEST_MIS)
, max_path_length(4)
, fov(96.0f)
, exposure(0.0f)
, cam_pos(make_float3(0, 0, 3))
, light_pos(make_float3(10, 0, 5))
, light_intensity(make_float3(0, 0, 0))
, enable_bsdf_flags(false)
, allowed_scatter_mode(
mi::neuraylib:
:DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT)
, hdrfile("nvidia/sdk_examples/resources/environment.hdr")
, hdr_rot(0.0f)
, outputfile("output.exr")
, material_names()
{}
};
struct Enum_value {
std::string name;
int value;
Enum_value(const std::string &name, int value)
: name(name), value(value)
{
}
};
struct Enum_type_info {
std::vector<Enum_value> values;
void add(const std::string &name, int value) {
values.push_back(Enum_value(name, value));
}
};
class Param_info
{
public:
enum Param_kind
{
PK_UNKNOWN,
PK_FLOAT,
PK_FLOAT2,
PK_FLOAT3,
PK_COLOR,
PK_ARRAY,
PK_BOOL,
PK_INT,
PK_ENUM,
PK_STRING,
PK_TEXTURE,
PK_LIGHT_PROFILE,
PK_BSDF_MEASUREMENT
};
Param_info(
char const *name,
char const *display_name,
char const *group_name,
Param_kind kind,
Param_kind array_elem_kind,
char *data_ptr,
const Enum_type_info *enum_info = nullptr)
: m_index(index)
, m_name(name)
, m_display_name(display_name)
, m_group_name(group_name)
, m_kind(kind)
, m_array_elem_kind(array_elem_kind)
, m_array_size(array_size)
, m_array_pitch(array_pitch)
, m_data_ptr(data_ptr)
, m_range_min(-100), m_range_max(100)
, m_enum_info(enum_info)
{
}
template<typename T>
T &data() { return *reinterpret_cast<T *>(m_data_ptr); }
template<typename T>
const T &data() const { return *reinterpret_cast<const T *>(m_data_ptr); }
const char * &display_name() { return m_display_name; }
const char *display_name() const { return m_display_name; }
const char * &group_name() { return m_group_name; }
const char *group_name() const { return m_group_name; }
Param_kind kind() const { return m_kind; }
Param_kind array_elem_kind() const { return m_array_elem_kind; }
mi::Size array_size()
const {
return m_array_size; }
mi::Size array_pitch()
const {
return m_array_pitch; }
float &range_min() { return m_range_min; }
float range_min() const { return m_range_min; }
float &range_max() { return m_range_max; }
float range_max() const { return m_range_max; }
const Enum_type_info *enum_info() const { return m_enum_info; }
private:
char const *m_name;
char const *m_display_name;
char const *m_group_name;
Param_kind m_kind;
Param_kind m_array_elem_kind;
char *m_data_ptr;
float m_range_min, m_range_max;
const Enum_type_info *m_enum_info;
};
class Material_info
{
public:
Material_info(char const *name)
: m_name(name)
{}
void add_sorted_by_group(
const Param_info &
info) {
bool group_found = false;
if (
info.group_name() !=
nullptr) {
for (std::list<Param_info>::iterator it = params().begin(); it != params().end(); ++it)
{
const bool same_group =
it->group_name() !=
nullptr && strcmp(it->group_name(),
info.group_name()) == 0;
if (group_found && !same_group) {
m_params.insert(it,
info);
return;
}
if (same_group)
group_found = true;
}
}
m_params.push_back(
info);
}
void add_enum_type(const std::string name, std::shared_ptr<Enum_type_info> enum_info) {
enum_types[name] = enum_info;
}
const Enum_type_info *get_enum_type(const std::string name) {
Enum_type_map::const_iterator it = enum_types.find(name);
if (it != enum_types.end())
return it->second.get();
return nullptr;
}
char const *name() const { return m_name; }
std::list<Param_info> ¶ms() { return m_params; }
private:
char const *m_name;
std::list<Param_info> m_params;
typedef std::map<std::string, std::shared_ptr<Enum_type_info> > Enum_type_map;
Enum_type_map enum_types;
};
class Resource_table
{
typedef std::map<std::string, unsigned> Resource_id_map;
public:
enum Kind {
RESOURCE_TEXTURE,
RESOURCE_LIGHT_PROFILE,
RESOURCE_BSDF_MEASUREMENT
};
Resource_table(
Kind kind)
: m_max_len(0u)
{
read_resources(target_code, transaction, kind);
}
size_t get_max_length() const { return m_max_len; }
std::vector<std::string> const &get_urls() const { return m_urls; }
private:
void read_resources(
Kind kind)
{
m_urls.push_back("<unset>");
switch (kind) {
case RESOURCE_TEXTURE:
for (
mi::Size i = 1, n = target_code->get_texture_count(); i < n; ++i) {
const char *s = target_code->get_texture(i);
char const *url = nullptr;
if (char const *img = tex->get_image()) {
url = image->get_filename(0, 0);
}
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_LIGHT_PROFILE:
for (
mi::Size i = 1, n = target_code->get_light_profile_count(); i < n; ++i) {
const char *s = target_code->get_light_profile(i);
char const *url = lp->get_filename();
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_BSDF_MEASUREMENT:
for (
mi::Size i = 1, n = target_code->get_bsdf_measurement_count(); i < n; ++i) {
const char *s = target_code->get_bsdf_measurement(i);
char const *url = bm->get_filename();
if (url == nullptr)
url = s;
size_t l = strlen(url);
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
}
}
private:
Resource_id_map m_resource_map;
std::vector<std::string> m_urls;
size_t m_max_len;
};
class String_constant_table
{
typedef std::map<std::string, unsigned> String_map;
public:
{
get_all_strings(target_code);
}
unsigned get_id_for_string(const char *name) {
String_map::const_iterator it(m_string_constants_map.find(name));
if (it != m_string_constants_map.end())
return it->second;
unsigned n_id = unsigned(m_string_constants_map.size() + 1);
m_string_constants_map[name] = n_id;
m_strings.reserve((n_id + 63) & ~63);
m_strings.push_back(name);
size_t l = strlen(name);
if (l > m_max_len)
m_max_len = l;
return n_id;
}
size_t get_max_length() const { return m_max_len; }
const char *get_string(unsigned id) {
if (id == 0 || id - 1 >= m_strings.size())
return nullptr;
return m_strings[id - 1].c_str();
}
private:
void get_all_strings(
{
m_max_len = 0;
m_strings.reserve(target_code->get_string_constant_count());
for (
mi::Size i = 1, n = target_code->get_string_constant_count(); i < n; ++i) {
const char *s = target_code->get_string_constant(i);
size_t l = strlen(s);
if (l > m_max_len)
m_max_len = l;
m_string_constants_map[s] = (unsigned)i;
m_strings.push_back(s);
}
}
private:
String_map m_string_constants_map;
std::vector<std::string> m_strings;
size_t m_max_len;
};
static void update_camera(
Kernel_params &kernel_params,
double phi,
double theta,
float base_dist,
int zoom)
{
kernel_params.cam_dir.x = float(-
sin(phi) *
sin(theta));
kernel_params.cam_dir.y = float(-
cos(theta));
kernel_params.cam_dir.z = float(-
cos(phi) *
sin(theta));
kernel_params.cam_right.x = float(
cos(phi));
kernel_params.cam_right.y = 0.0f;
kernel_params.cam_right.z = float(-
sin(phi));
kernel_params.cam_up.x = float(-
sin(phi) *
cos(theta));
kernel_params.cam_up.y = float(
sin(theta));
kernel_params.cam_up.z = float(-
cos(phi) *
cos(theta));
const float dist = float(base_dist *
pow(0.95,
double(zoom)));
kernel_params.cam_pos.x = -kernel_params.cam_dir.x * dist;
kernel_params.cam_pos.y = -kernel_params.cam_dir.y * dist;
kernel_params.cam_pos.z = -kernel_params.cam_dir.z * dist;
}
static bool handle_resource(
Param_info ¶m,
Resource_table const &res_table)
{
bool changed = false;
std::vector<std::string> const &urls = res_table.get_urls();
int id = param.data<int>();
std::string cur_url = urls[id];
if (ImGui::BeginCombo(param.display_name(), cur_url.c_str())) {
for (size_t i = 0, n = urls.size(); i < n; ++i) {
const std::string &name = urls[i];
bool is_selected = (cur_url == name);
if (ImGui::Selectable(name.c_str(), is_selected)) {
param.data<int>() = int(i);
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
return changed;
}
static void launch_subframe(
mi::examples::mdl::GL_display* gl_display,
int width,
int height,
CUfunction cuda_function,
Kernel_params *kernel_params)
{
if (gl_display) {
kernel_params->display_buffer =
reinterpret_cast<unsigned int*>(gl_display->map(0));
}
dim3 threads_per_block(16, 16);
dim3 num_blocks((width + 15) / 16, (height + 15) / 16);
void* params[] = { kernel_params };
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, params, nullptr));
kernel_params->iteration_start += kernel_params->iteration_num;
if (gl_display) {
gl_display->unmap(0);
}
check_cuda_success(cuStreamSynchronize(0));
}
static void render_scene(
const Options &options,
Material_compiler::Material_definition_list const &material_defs,
Material_compiler::Compiled_material_list const &compiled_materials,
std::vector<size_t> const &arg_block_indices,
std::vector<Df_cuda_material> const &material_bundle,
LPE_state_machine &lpe_state_machine)
{
Window_context window_context;
memset(&window_context, 0, sizeof(Window_context));
mi::examples::mdl::GL_display *gl_display = nullptr;
GLFWwindow *window = nullptr;
int width = -1;
int height = -1;
if (options.opengl) {
std::string version_string;
window = init_opengl(version_string, int(options.res_x), int(options.res_y));
glfwSetWindowUserPointer(window, &window_context);
glfwSetKeyCallback(window, handle_key);
glfwSetScrollCallback(window, handle_scroll);
glfwSetCursorPosCallback(window, handle_mouse_pos);
glfwSetMouseButtonCallback(window, handle_mouse_button);
glfwSetCharCallback(window, ImGui_ImplGlfw_CharCallback);
IMGUI_CHECKVERSION();
ImGui::CreateContext();
ImGui_ImplGlfw_InitForOpenGL(window, false);
ImGui_ImplOpenGL3_Init(version_string.c_str());
ImGui::GetIO().IniFilename = nullptr;
ImGui::StyleColorsDark();
ImGui::GetStyle().Alpha = 0.7f;
ImGui::GetStyle().ScaleAllSizes(options.gui_scale);
gl_display = new mi::examples::mdl::GL_display(int(options.res_x), int(options.res_y));
}
CUcontext cuda_context = init_cuda(options.cuda_device, options.opengl);
CUdeviceptr accum_buffer = 0;
CUdeviceptr aux_albedo_buffer = 0;
CUdeviceptr aux_normal_buffer = 0;
if (!options.opengl) {
width = options.res_x;
height = options.res_y;
check_cuda_success(cuMemAlloc(&accum_buffer, width * height * sizeof(float3)));
check_cuda_success(cuMemAlloc(&aux_albedo_buffer, width * height * sizeof(float3)));
check_cuda_success(cuMemAlloc(&aux_normal_buffer, width * height * sizeof(float3)));
}
Kernel_params kernel_params;
memset(&kernel_params, 0, sizeof(Kernel_params));
kernel_params.cam_focal = 1.0f / tanf(options.fov / 2 * float(2 * M_PI / 360));
kernel_params.light_pos = options.light_pos;
kernel_params.light_intensity = fmaxf(
options.light_intensity.x, fmaxf(options.light_intensity.y, options.light_intensity.z));
kernel_params.light_color = kernel_params.light_intensity > 0.0f
? options.light_intensity / kernel_params.light_intensity
: make_float3(1.0f, 0.9f, 0.5f);
kernel_params.env_intensity = 1.0f;
kernel_params.iteration_start = 0;
kernel_params.iteration_num = options.samples_per_iteration;
kernel_params.mdl_test_type = options.mdl_test_type;
kernel_params.max_path_length = options.max_path_length;
kernel_params.exposure_scale = powf(2.0f, options.exposure);
kernel_params.disable_aa = options.no_aa;
kernel_params.bsdf_data_flags = options.allowed_scatter_mode;
kernel_params.use_derivatives = options.enable_derivatives;
kernel_params.enable_auxiliary_output = options.enable_auxiliary_output;
kernel_params.display_buffer_index = 0;
kernel_params.lpe_ouput_expression = 0;
kernel_params.lpe_state_table = nullptr;
kernel_params.lpe_final_mask = nullptr;
kernel_params.current_material = 0;
kernel_params.geometry = material_bundle[kernel_params.current_material].contains_hair_bsdf ?
GT_HAIR : GT_SPHERE;
float base_dist =
length(options.cam_pos);
double theta, phi;
{
const float3 inv_dir = normalize(options.cam_pos);
phi =
atan2(inv_dir.x, inv_dir.z);
}
update_camera(kernel_params, phi, theta, base_dist, window_context.zoom);
std::vector<mi::base::Handle<const mi::neuraylib::ITarget_code> > target_codes;
target_codes.push_back(target_code);
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_df_cuda_derivatives.ptx" : "example_df_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(mi::examples::io::get_executable_folder() + "/" + ptx_name).c_str(),
"render_scene_kernel",
&cuda_function);
CUdeviceptr material_buffer = 0;
check_cuda_success(cuMemAlloc(&material_buffer,
material_bundle.size() * sizeof(Df_cuda_material)));
check_cuda_success(cuMemcpyHtoD(material_buffer, material_bundle.data(),
material_bundle.size() * sizeof(Df_cuda_material)));
kernel_params.material_buffer = reinterpret_cast<Df_cuda_material*>(material_buffer);
CUdeviceptr env_accel;
cudaArray_t env_tex_data;
create_environment(
&kernel_params.env_tex, &env_tex_data, &env_accel, &kernel_params.env_size, transaction,
image_api, options.hdrfile.c_str());
kernel_params.env_accel = reinterpret_cast<Env_accel *>(env_accel);
kernel_params.env_rotation = options.hdr_rot / 180.0f * float(M_PI);
upload_lpe_state_machine(kernel_params, lpe_state_machine);
std::string next_filename_base;
std::string filename_base, filename_ext;
size_t dot_pos = options.outputfile.rfind('.');
if (dot_pos == std::string::npos) {
filename_base = options.outputfile;
} else {
filename_base = options.outputfile.substr(0, dot_pos);
filename_ext = options.outputfile.substr(dot_pos);
}
if (options.material_names.size() > 1)
next_filename_base = filename_base + "-0";
else
next_filename_base = filename_base;
{
Material_gpu_context material_gpu_context(options.enable_derivatives);
if (!material_gpu_context.prepare_target_code_data(
transaction.
get(), image_api.
get(), target_code.
get(), arg_block_indices))
terminate();
kernel_params.tc_data = reinterpret_cast<Target_code_data *>(
material_gpu_context.get_device_target_code_data_list());
kernel_params.arg_block_list = reinterpret_cast<char const **>(
material_gpu_context.get_device_target_argument_block_list());
String_constant_table constant_table(target_code);
Resource_table texture_table(target_code, transaction, Resource_table::RESOURCE_TEXTURE);
Resource_table lp_table(target_code, transaction, Resource_table::RESOURCE_LIGHT_PROFILE);
Resource_table bm_table(
target_code, transaction, Resource_table::RESOURCE_BSDF_MEASUREMENT);
std::vector<Material_info> mat_infos;
for (size_t i = 0, num_mats = compiled_materials.size(); i < num_mats; ++i) {
size_t arg_block_index = material_gpu_context.get_bsdf_argument_block_index(i);
material_gpu_context.get_argument_block_layout(arg_block_index));
material_gpu_context.get_argument_block(arg_block_index));
char *arg_block_data = arg_block != nullptr ? arg_block->get_data() : nullptr;
if (name == nullptr) continue;
Param_info::Param_kind param_kind = Param_info::PK_UNKNOWN;
Param_info::Param_kind param_array_elem_kind = Param_info::PK_UNKNOWN;
const Enum_type_info *enum_type = nullptr;
switch (kind) {
param_kind = Param_info::PK_FLOAT;
break;
param_kind = Param_info::PK_COLOR;
break;
param_kind = Param_info::PK_BOOL;
break;
param_kind = Param_info::PK_INT;
break;
{
val->get_type());
val_type->get_element_type());
switch (val_type->get_size()) {
case 2: param_kind = Param_info::PK_FLOAT2; break;
case 3: param_kind = Param_info::PK_FLOAT3; break;
default: assert(false && "Vector Size invalid or unhandled.");
}
}
}
break;
{
val->get_type());
val_type->get_element_type());
switch (elem_type->get_kind()) {
param_array_elem_kind = Param_info::PK_FLOAT;
break;
param_array_elem_kind = Param_info::PK_COLOR;
break;
param_array_elem_kind = Param_info::PK_BOOL;
break;
param_array_elem_kind = Param_info::PK_INT;
break;
{
elem_type.get_interface<
val_type->get_element_type());
switch (val_type->get_size()) {
case 2:
param_array_elem_kind = Param_info::PK_FLOAT2;
break;
case 3:
param_array_elem_kind = Param_info::PK_FLOAT3;
break;
default:
assert(false && "Vector Size invalid or unhandled.");
}
}
}
break;
default:
assert(false && "Array element type invalid or unhandled.");
}
if (param_array_elem_kind != Param_info::PK_UNKNOWN) {
param_kind = Param_info::PK_ARRAY;
param_array_size = val_type->get_size();
if (param_array_size > 1) {
layout->get_nested_state(j));
layout->get_nested_state(1, array_state));
mi::Size start_offset = layout->get_layout(
kind, param_size, array_state);
mi::Size next_offset = layout->get_layout(
kind, param_size, next_elem_state);
param_array_pitch = next_offset - start_offset;
}
}
}
break;
{
val->get_type());
const Enum_type_info *
info = mat_info.get_enum_type(val_type->get_symbol());
std::shared_ptr<Enum_type_info> p(new Enum_type_info());
for (
mi::Size i = 0, n = val_type->get_size(); i < n; ++i) {
p->add(val_type->get_value_name(i), val_type->get_value_code(i));
}
mat_info.add_enum_type(val_type->get_symbol(), p);
}
param_kind = Param_info::PK_ENUM;
}
break;
param_kind = Param_info::PK_STRING;
break;
param_kind = Param_info::PK_TEXTURE;
break;
param_kind = Param_info::PK_LIGHT_PROFILE;
break;
param_kind = Param_info::PK_BSDF_MEASUREMENT;
break;
default:
continue;
}
mi::Size offset = layout->get_layout(kind2, param_size, state);
check_success(kind == kind2);
Param_info param_info(
j,
name,
name,
nullptr,
param_kind,
param_array_elem_kind,
param_array_size,
param_array_pitch,
arg_block_data + offset,
enum_type);
anno_list->get_annotation_block(name));
if (anno_block) {
annos.get_annotation_index("::anno::soft_range(float,float)");
anno_index = annos.get_annotation_index("::anno::hard_range(float,float)");
}
annos.get_annotation_param_value(anno_index, 0, param_info.range_min());
annos.get_annotation_param_value(anno_index, 1, param_info.range_max());
}
anno_index = annos.get_annotation_index("::anno::display_name(string)");
annos.get_annotation_param_value(anno_index, 0, param_info.display_name());
}
anno_index = annos.get_annotation_index("::anno::in_group(string)");
annos.get_annotation_param_value(anno_index, 0, param_info.group_name());
}
}
mat_info.add_sorted_by_group(param_info);
}
mat_infos.push_back(mat_info);
}
if (!options.opengl)
{
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.accum_buffer = reinterpret_cast<float3*>(accum_buffer);
kernel_params.albedo_buffer = reinterpret_cast<float3*>(aux_albedo_buffer);
kernel_params.normal_buffer = reinterpret_cast<float3*>(aux_normal_buffer);
while (kernel_params.current_material < material_bundle.size()) {
{
Timing timing("rendering");
while (kernel_params.iteration_start < options.iterations) {
launch_subframe(gl_display, width, height, cuda_function, &kernel_params);
}
}
save_result(
accum_buffer, width, height,
next_filename_base + filename_ext,
factory, image_api, mdl_impexp_api);
save_result(
aux_albedo_buffer, width, height,
next_filename_base + "_albedo" + filename_ext,
factory, image_api, mdl_impexp_api);
save_result(
aux_normal_buffer, width, height,
next_filename_base + "_normal" + filename_ext,
factory, image_api, mdl_impexp_api);
if (kernel_params.current_material + 1 >= material_bundle.size())
break;
if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
kernel_params.geometry = GT_SPHERE;
else
kernel_params.geometry = GT_HAIR;
kernel_params.iteration_start = 0;
++kernel_params.current_material;
next_filename_base =
filename_base + "-" + to_string(kernel_params.current_material);
}
} else {
std::chrono::duration<double> state_update_time( 0.0 );
std::chrono::duration<double> render_time( 0.0 );
std::chrono::duration<double> display_time( 0.0 );
char stats_text[128];
int last_update_frames = -1;
auto last_update_time = std::chrono::steady_clock::now();
const std::chrono::duration<double> update_min_interval( 0.5 );
while (true)
{
std::chrono::time_point<std::chrono::steady_clock> t0 =
std::chrono::steady_clock::now();
if (glfwWindowShouldClose(window))
break;
glfwPollEvents();
int nwidth, nheight;
glfwGetFramebufferSize(window, &nwidth, &nheight);
if (nwidth != width || nheight != height)
{
width = nwidth;
height = nheight;
gl_display->resize(width, height);
resize_buffers(
&accum_buffer, width, height);
kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
resize_buffers(&aux_albedo_buffer, width, height);
kernel_params.albedo_buffer = reinterpret_cast<float3 *>(aux_albedo_buffer);
resize_buffers(&aux_normal_buffer, width, height);
kernel_params.normal_buffer = reinterpret_cast<float3 *>(aux_normal_buffer);
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.iteration_start = 0;
}
if (width == 0 || height == 0) {
glfwWaitEvents();
continue;
}
ImGui_ImplOpenGL3_NewFrame();
ImGui_ImplGlfw_NewFrame();
ImGui::NewFrame();
ImGui::SetNextWindowPos(ImVec2(10, 100), ImGuiCond_FirstUseEver);
ImGui::SetNextWindowSize(
ImVec2(360 * options.gui_scale, 600 * options.gui_scale),
ImGuiCond_FirstUseEver);
ImGui::Begin("Settings");
ImGui::SetWindowFontScale(options.gui_scale);
ImGui::PushItemWidth(-200 * options.gui_scale);
if (options.use_class_compilation)
ImGui::Text("CTRL + Click to manually enter numbers");
else
ImGui::Text("Parameter editing requires class compilation.");
if (kernel_params.enable_auxiliary_output)
{
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Display options");
ImGui::Separator();
std::string current_lpe_name = lpe_state_machine.get_expression_name(
kernel_params.lpe_ouput_expression);
if (ImGui::BeginCombo("LPE", current_lpe_name.c_str()))
{
for (uint32_t i = 0; i < lpe_state_machine.get_expression_count(); ++i)
{
const char* name = lpe_state_machine.get_expression_name(i);
bool is_selected = (i == kernel_params.lpe_ouput_expression);
if (ImGui::Selectable(name, is_selected))
{
kernel_params.lpe_ouput_expression = i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
std::string current_display_buffer =
to_string((Display_buffer_options) kernel_params.display_buffer_index);
if (ImGui::BeginCombo("buffer", current_display_buffer.c_str()))
{
for (unsigned i = 0; i < (unsigned) DISPLAY_BUFFER_COUNT; ++i)
{
const std::string &name = to_string((Display_buffer_options) i);
bool is_selected = (current_display_buffer == name);
if (ImGui::Selectable(name.c_str(), is_selected))
{
kernel_params.display_buffer_index = i;
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
if (options.enable_bsdf_flags)
{
std::string current_allow_mode = to_string(kernel_params.bsdf_data_flags);
if (ImGui::BeginCombo("BSDF flags", current_allow_mode.c_str()))
{
for (unsigned i = 0;
i <= (unsigned) mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT; ++i)
{
bool is_selected = (current_allow_mode == name);
if (ImGui::Selectable(name.c_str(), is_selected))
{
kernel_params.iteration_start = 0;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Light parameters");
ImGui::Separator();
if (ImGui::ColorEdit3("Point Light Color", &kernel_params.light_color.x))
kernel_params.iteration_start = 0;
if (ImGui::SliderFloat("Point Light Intensity",
&kernel_params.light_intensity, 0.0f, 50000.0f))
kernel_params.iteration_start = 0;
if (ImGui::SliderFloat("Environment Intensity Scale",
&kernel_params.env_intensity, 0.0f, 10.0f))
kernel_params.iteration_start = 0;
float env_rot_degree = kernel_params.env_rotation / float(M_PI) * 180.0f;
if (ImGui::SliderFloat("Environment Rotation",
&env_rot_degree, 0.0f, 360.0f))
{
env_rot_degree -= floorf(env_rot_degree / 360.0f) * 360.f;
kernel_params.env_rotation = fmodf(env_rot_degree, 360.0f) / 180.0f * float(M_PI);
kernel_params.iteration_start = 0;
}
ImGui::Dummy(ImVec2(0.0f, 3.0f));
ImGui::Text("Material parameters");
ImGui::Separator();
Material_info &mat_info = mat_infos[
material_bundle[kernel_params.current_material].compiled_material_index];
ImGui::Text("%s", mat_info.name());
bool changed = false;
const char *group_name = nullptr;
int id = 0;
for (std::list<Param_info>::iterator it = mat_info.params().begin(),
end = mat_info.params().end(); it != end; ++it, ++id)
{
Param_info ¶m = *it;
ImGui::PushID(id);
if ((!param.group_name() != !group_name) ||
(param.group_name() &&
(!group_name || strcmp(group_name, param.group_name()) != 0)))
{
ImGui::Separator();
if (param.group_name() != nullptr)
ImGui::Text("%s", param.group_name());
group_name = param.group_name();
}
switch (param.kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
param.display_name(),
¶m.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
param.display_name(),
¶m.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
param.display_name(),
¶m.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
param.display_name(),
¶m.data<float>());
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
¶m.data<bool>());
break;
case Param_info::PK_INT:
changed |= ImGui::SliderInt(
param.display_name(),
¶m.data<int>(),
int(param.range_min()),
int(param.range_max()));
break;
case Param_info::PK_ARRAY:
{
ImGui::Text("%s", param.display_name());
ImGui::Indent(16.0f * options.gui_scale);
char *ptr = ¶m.data<char>();
for (
mi::Size i = 0, n = param.array_size(); i < n; ++i) {
std::string idx_str = to_string(i);
switch (param.array_elem_kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
idx_str.c_str(),
reinterpret_cast<float *>(ptr),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
idx_str.c_str(),
reinterpret_cast<float *>(ptr));
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
reinterpret_cast<bool *>(ptr));
break;
case Param_info::PK_INT:
changed |= ImGui::SliderInt(
param.display_name(),
reinterpret_cast<int *>(ptr),
int(param.range_min()),
int(param.range_max()));
break;
default:
assert(false && "Array element type invalid or unhandled.");
}
ptr += param.array_pitch();
}
ImGui::Unindent(16.0f * options.gui_scale);
}
break;
case Param_info::PK_ENUM:
{
int value = param.data<int>();
std::string curr_value;
const Enum_type_info *
info = param.enum_info();
for (
size_t i = 0, n =
info->values.size(); i < n; ++i) {
if (
info->values[i].value == value) {
curr_value =
info->values[i].name;
break;
}
}
if (ImGui::BeginCombo(param.display_name(), curr_value.c_str())) {
for (
size_t i = 0, n =
info->values.size(); i < n; ++i) {
const std::string &name =
info->values[i].name;
bool is_selected = (curr_value == name);
if (ImGui::Selectable(
info->values[i].name.c_str(), is_selected)) {
param.data<
int>() =
info->values[i].value;
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
break;
case Param_info::PK_STRING:
{
std::vector<char> buf;
size_t max_len = constant_table.get_max_length();
max_len = max_len > 63 ? max_len + 1 : 64;
buf.resize(max_len);
unsigned curr_index = param.data<unsigned>();
const char *opt = constant_table.get_string(curr_index);
strcpy(buf.data(), opt != nullptr ? opt : "");
if (ImGui::InputText(
param.display_name(),
buf.data(), buf.size(),
ImGuiInputTextFlags_EnterReturnsTrue))
{
unsigned id = constant_table.get_id_for_string(buf.data());
param.data<unsigned>() = id;
changed = true;
}
}
break;
case Param_info::PK_TEXTURE:
changed |= handle_resource(param, texture_table);
break;
case Param_info::PK_LIGHT_PROFILE:
changed |= handle_resource(param, lp_table);
break;
case Param_info::PK_BSDF_MEASUREMENT:
changed |= handle_resource(param, bm_table);
break;
case Param_info::PK_UNKNOWN:
break;
}
ImGui::PopID();
}
if (options.enable_derivatives) {
ImGui::Separator();
bool b = kernel_params.use_derivatives != 0;
if (ImGui::Checkbox("Use derivatives", &b)) {
kernel_params.iteration_start = 0;
kernel_params.use_derivatives = b;
}
}
ImGui::PopItemWidth();
ImGui::End();
if (changed) {
material_gpu_context.update_device_argument_block(
material_bundle[kernel_params.current_material].argument_block_index);
kernel_params.iteration_start = 0;
}
Window_context *ctx =
static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->save_result && !ImGui::GetIO().WantCaptureKeyboard) {
save_result(
accum_buffer,
width, height,
options.outputfile,
factory, image_api, mdl_impexp_api);
save_result(
aux_albedo_buffer,
width, height,
filename_base + "_albedo" + filename_ext,
factory, image_api, mdl_impexp_api);
save_result(
aux_normal_buffer,
width, height,
filename_base + "_normal" + filename_ext,
factory, image_api, mdl_impexp_api);
}
if (ctx->exposure_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.exposure_scale = powf(2.0f, ctx->exposure);
}
if (ctx->key_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.iteration_start = 0;
const unsigned num_materials = unsigned(material_bundle.size());
kernel_params.current_material = (kernel_params.current_material +
ctx->material_index_delta + num_materials) % num_materials;
ctx->material_index_delta = 0;
if (material_bundle[kernel_params.current_material].contains_hair_bsdf == 0)
kernel_params.geometry = GT_SPHERE;
else
kernel_params.geometry = GT_HAIR;
}
if (ctx->mouse_button - 1 == GLFW_MOUSE_BUTTON_LEFT) {
if (ctx->mouse_button_action == GLFW_PRESS &&
!ImGui::GetIO().WantCaptureMouse) {
ctx->moving = true;
glfwGetCursorPos(window, &ctx->move_start_x, &ctx->move_start_y);
}
else
ctx->moving = false;
}
if (ctx->mouse_wheel_delta && !ImGui::GetIO().WantCaptureMouse) {
ctx->zoom += ctx->mouse_wheel_delta;
}
if (ctx->mouse_event && !ImGui::GetIO().WantCaptureMouse) {
kernel_params.iteration_start = 0;
phi -= ctx->move_dx * 0.001 * M_PI;
theta -= ctx->move_dy * 0.001 * M_PI;
theta = std::max(theta, 0.00 * M_PI);
theta = std::min(theta, 1.00 * M_PI);
ctx->move_dx = ctx->move_dy = 0.0;
update_camera(kernel_params, phi, theta, base_dist, ctx->zoom);
}
ctx->save_result = false;
ctx->key_event = false;
ctx->mouse_event = false;
ctx->exposure_event = false;
ctx->mouse_wheel_delta = 0;
ctx->mouse_button = 0;
auto t1 = std::chrono::steady_clock::now();
state_update_time += t1 - t0;
t0 = t1;
launch_subframe(gl_display, width, height, cuda_function, &kernel_params);
t1 = std::chrono::steady_clock::now();
render_time += t1 - t0;
t0 = t1;
gl_display->update_display();
t1 = std::chrono::steady_clock::now();
display_time += t1 - t0;
ImGui::SetNextWindowPos(ImVec2(10, 10));
ImGui::Begin("##notitle", nullptr,
ImGuiWindowFlags_NoDecoration |
ImGuiWindowFlags_AlwaysAutoResize |
ImGuiWindowFlags_NoSavedSettings |
ImGuiWindowFlags_NoFocusOnAppearing |
ImGuiWindowFlags_NoNav);
++last_update_frames;
if (t1 - last_update_time > update_min_interval || last_update_frames == 0) {
typedef std::chrono::duration<double, std::milli> durationMs;
snprintf(stats_text, sizeof(stats_text),
"%5.1f fps\n\n"
"state update: %8.1f ms\n"
"render: %8.1f ms\n"
"display: %8.1f ms\n",
last_update_frames / std::chrono::duration<double>(
t1 - last_update_time).count(),
(durationMs(state_update_time) / last_update_frames).count(),
(durationMs(render_time) / last_update_frames).count(),
(durationMs(display_time) / last_update_frames).count());
last_update_time = t1;
last_update_frames = 0;
state_update_time = render_time = display_time =
std::chrono::duration<double>::zero();
}
ImGui::TextUnformatted(stats_text);
ImGui::End();
ImGui::Render();
ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
glfwSwapBuffers(window);
}
}
}
check_cuda_success(cudaDestroyTextureObject(kernel_params.env_tex));
check_cuda_success(cudaFreeArray(env_tex_data));
check_cuda_success(cuMemFree(env_accel));
check_cuda_success(cuMemFree(accum_buffer));
check_cuda_success(cuMemFree(aux_albedo_buffer));
check_cuda_success(cuMemFree(aux_normal_buffer));
check_cuda_success(cuMemFree(material_buffer));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_state_table)));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(kernel_params.lpe_final_mask)));
check_cuda_success(cuModuleUnload(cuda_module));
uninit_cuda(cuda_context);
if (options.opengl) {
delete gl_display;
gl_display = nullptr;
ImGui_ImplOpenGL3_Shutdown();
ImGui_ImplGlfw_Shutdown();
ImGui::DestroyContext();
glfwDestroyWindow(window);
glfwTerminate();
}
}
bool starts_with(std::string const &str, std::string const &prefix)
{
return str.size() >= prefix.size() && str.compare(0, prefix.size(), prefix) == 0;
}
Df_cuda_material create_cuda_material(
size_t target_code_index,
size_t compiled_material_index,
std::vector<mi::neuraylib::Target_function_description> const& descs,
bool use_hair_bsdf)
{
Df_cuda_material mat;
mat.compiled_material_index = static_cast<unsigned int>(compiled_material_index);
mat.argument_block_index = static_cast<unsigned int>(descs[0].argument_block_index);
mat.init.x = static_cast<unsigned int>(target_code_index);
mat.init.y = static_cast<unsigned int>(descs[0].function_index);
if (!use_hair_bsdf)
{
bool has_constant_thin_walled = false;
bool is_thin_walled = false;
{
has_constant_thin_walled = true;
is_thin_walled = thin_walled_bool->get_value();
}
bool need_backface_bsdf = false;
bool need_backface_edf = false;
bool need_backface_emission_intensity = false;
if (!has_constant_thin_walled || is_thin_walled)
{
need_backface_bsdf =
need_backface_edf =
need_backface_emission_intensity =
{
scattering_expr_constant->get_value());
emission_expr_constant->get_value());
{
need_backface_bsdf = false;
need_backface_edf = false;
need_backface_emission_intensity = false;
}
}
}
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[1].function_index);
mat.edf.x = static_cast<unsigned int>(target_code_index);
mat.edf.y = static_cast<unsigned int>(descs[2].function_index);
mat.emission_intensity.x = static_cast<unsigned int>(target_code_index);
mat.emission_intensity.y = static_cast<unsigned int>(descs[3].function_index);
mat.backface_bsdf.x = static_cast<unsigned int>(target_code_index);
mat.backface_bsdf.y = static_cast<unsigned int>(
need_backface_bsdf ? descs[8].function_index : descs[1].function_index);
mat.backface_edf.x = static_cast<unsigned int>(target_code_index);
mat.backface_edf.y = static_cast<unsigned int>(
need_backface_edf ? descs[9].function_index : descs[2].function_index);
mat.backface_emission_intensity.x = static_cast<unsigned int>(target_code_index);
mat.backface_emission_intensity.y = static_cast<unsigned int>(
need_backface_emission_intensity ? descs[10].function_index : descs[3].function_index);
mat.volume_absorption.x = static_cast<unsigned int>(target_code_index);
mat.volume_absorption.y = static_cast<unsigned int>(descs[4].function_index);
mat.thin_walled.x = static_cast<unsigned int>(target_code_index);
mat.thin_walled.y = static_cast<unsigned int>(descs[5].function_index);
mat.cutout_opacity.x = static_cast<unsigned int>(target_code_index);
mat.cutout_opacity.y = static_cast<unsigned int>(descs[6].function_index);
}
else
{
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[7].function_index);
mat.contains_hair_bsdf = 1;
}
memset(mat.bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.backface_bsdf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
memset(mat.backface_edf_mtag_to_gtag_map, 0, MAX_DF_HANDLES * sizeof(unsigned int));
return mat;
}
void create_cuda_material_handles(
Df_cuda_material& mat,
LPE_state_machine& lpe_state_machine)
{
mat.bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
for (
mi::Size i = 0; i < mat.bsdf_mtag_to_gtag_map_size; ++i)
mat.bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
mat.edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
for (
mi::Size i = 0; i < mat.edf_mtag_to_gtag_map_size; ++i)
mat.edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
mat.backface_bsdf_mtag_to_gtag_map_size = static_cast<unsigned int>(
for (
mi::Size i = 0; i < mat.backface_bsdf_mtag_to_gtag_map_size; ++i)
mat.backface_bsdf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
mat.backface_edf_mtag_to_gtag_map_size = static_cast<unsigned int>(
for (
mi::Size i = 0; i < mat.backface_edf_mtag_to_gtag_map_size; ++i)
mat.backface_edf_mtag_to_gtag_map[i] = lpe_state_machine.handle_to_global_tag(
}
{
for (
mi::Size i = 0, n = body_args->get_size(); i < n; ++i)
{
const char* name = body_args->get_name(i);
if (strcmp(name, "hair") == 0)
{
body_args->get_expression(i));
return true;
hair_exp_const->get_value());
}
}
return true;
}
static void usage(const char *name)
{
std::cout
<< "usage: " << name << " [options] [<material_name1|full_mdle_path1> ...]\n"
<< "-h|--help print this text and exit\n"
<< "-v|--version print the MDL SDK version string and exit\n"
<< "--device <id> run on CUDA device <id> (default: 0)\n"
<< "--nogl don't open interactive display\n"
<< "--nocc don't use class-compilation\n"
<< "--noaux don't generate code for albedo and normal buffers\n"
<< "--nopdf don't generate code for pdf\n"
<< " (for simplicity implies --noaux)\n"
<< "--an use adapt normal function\n"
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
<< "--interpreter use the DF interpreter (if possible)\n"
#endif
<< "--gui_scale <factor> GUI scaling factor (default: 1.0)\n"
<< "--res <res_x> <res_y> resolution (default: 1024x1024)\n"
<< "--hdr <filename> HDR environment map "
"(default: nvidia/sdk_examples/resources/environment.hdr)\n"
<< "--hdr_rot <degrees> rotation of the environment in degrees (default: 0.0)\n"
<< "-o <outputfile> image file to write result to (default: output.exr).\n"
<< " With multiple materials \"-<material index>\" will be\n"
<< " added in front of the extension\n"
<< "--spp <num> samples per pixel, only active for --nogl (default: 4096)\n"
<< "--spi <num> samples per render call (default: 8)\n"
<< "-t <type> 0: eval, 1: sample, 2: mis, 3: mis + pdf, 4: no env\n"
<< " (default: 2)\n"
<< "-e <exposure> exposure for interactive display (default: 0.0)\n"
<< "-f <fov> the camera field of view in degree (default: 96.0)\n"
<< "--cam <x> <y> <z> set the camera position (default 0 0 3).\n"
<< " The camera will always look towards (0, 0, 0).\n"
<< "-l <x> <y> <z> <r> <g> <b> add an isotropic point light with given coordinates and "
"intensity (flux)\n"
<< "-p|--mdl_path <path> MDL search path, can occur multiple times.\n"
<< "--max_path_length <num> maximum path length, default 4 (up to one total internal\n"
<< " reflection), clamped to 2..100\n"
<< "--noaa disable pixel oversampling\n"
<< "-d enable use of derivatives\n"
<< "--fold_ternary_on_df fold all ternary operators on *df types (default: false)\n"
<< "--allowed_scatter_mode <m> limits the allowed scatter mode to \"none\", \"reflect\", "
<< "\"transmit\" or \"reflect_and_transmit\" (default: restriction disabled)\n"
<< "\n"
<< "Note: material names can end with an '*' as a wildcard\n"
<< " and alternatively, full MDLE file paths can be passed as material name\n";
exit(EXIT_FAILURE);
}
int MAIN_UTF8(int argc, char* argv[])
{
Options options;
mi::examples::mdl::Configure_options configure_options;
bool print_version_and_exit = false;
for (int i = 1; i < argc; ++i) {
const char *opt = argv[i];
if (opt[0] == '-') {
if (strcmp(opt, "--device") == 0 && i < argc - 1) {
options.cuda_device = atoi(argv[++i]);
} else if (strcmp(opt, "--nogl") == 0) {
options.opengl = false;
} else if (strcmp(opt, "--nocc") == 0) {
options.use_class_compilation = false;
} else if (strcmp(opt, "--noaux") == 0) {
options.enable_auxiliary_output = false;
} else if (strcmp(opt, "--nopdf") == 0) {
options.enable_pdf = false;
options.enable_auxiliary_output = false;
} else if (strcmp(opt, "--an") == 0) {
options.use_adapt_normal = false;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
} else if (strcmp(opt, "--interpreter") == 0) {
options.use_df_interpreter = true;
#endif
} else if (strcmp(opt, "--gui_scale") == 0 && i < argc - 1) {
options.gui_scale = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "--res") == 0 && i < argc - 2) {
options.res_x = std::max(atoi(argv[++i]), 1);
options.res_y = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--hdr") == 0 && i < argc - 1) {
options.hdrfile = argv[++i];
} else if (strcmp(opt, "--hdr_rot") == 0 && i < argc - 1) {
options.hdr_rot = static_cast<float>(atof(argv[++i]));
options.hdr_rot -= floorf(options.hdr_rot / 360.0f) * 360.f;
options.hdr_rot = fmodf(options.hdr_rot, 360.0f);
} else if (strcmp(opt, "-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
} else if (strcmp(opt, "--spp") == 0 && i < argc - 1) {
options.iterations = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "--spi") == 0 && i < argc - 1) {
options.samples_per_iteration = std::max(atoi(argv[++i]), 1);
} else if (strcmp(opt, "-t") == 0 && i < argc - 1) {
const int type = atoi(argv[++i]);
if (type < 0 || type >= MDL_TEST_COUNT) {
std::cout << "Invalid type for \"-t\" option!" << std::endl;
usage(argv[0]);
}
options.mdl_test_type = type;
} else if (strcmp(opt, "-e") == 0 && i < argc - 1) {
options.exposure = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-f") == 0 && i < argc - 1) {
options.fov = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "--cam") == 0 && i < argc - 3) {
options.cam_pos.x = static_cast<float>(atof(argv[++i]));
options.cam_pos.y = static_cast<float>(atof(argv[++i]));
options.cam_pos.z = static_cast<float>(atof(argv[++i]));
} else if (strcmp(opt, "-l") == 0 && i < argc - 6) {
options.light_pos.x = static_cast<float>(atof(argv[++i]));
options.light_pos.y = static_cast<float>(atof(argv[++i]));
options.light_pos.z = static_cast<float>(atof(argv[++i]));
options.light_intensity.x = static_cast<float>(atof(argv[++i]));
options.light_intensity.y = static_cast<float>(atof(argv[++i]));
options.light_intensity.z = static_cast<float>(atof(argv[++i]));
} else if ((strcmp(opt, "-p") == 0 || strcmp(opt, "--mdl_path") == 0) && i < argc - 1) {
configure_options.additional_mdl_paths.push_back(argv[++i]);
} else if (strcmp(opt, "--max_path_length") == 0 && i < argc - 1) {
options.max_path_length = std::min(std::max(atoi(argv[++i]), 2), 100);
} else if (strcmp(opt, "--noaa") == 0) {
options.no_aa = true;
} else if (strcmp(opt, "-d") == 0) {
options.enable_derivatives = true;
} else if (strcmp(opt, "--fold_ternary_on_df") == 0) {
options.fold_ternary_on_df = true;
} else if (strcmp(opt, "--allowed_scatter_mode") == 0 && i < argc - 1) {
options.enable_bsdf_flags = true;
char const *mode = argv[++i];
if (strcmp(mode, "none") == 0) {
} else if (strcmp(mode, "reflect") == 0) {
options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_ALLOW_REFLECT;
} else if (strcmp(mode, "transmit") == 0) {
options.allowed_scatter_mode = mi::neuraylib::DF_FLAGS_ALLOW_TRANSMIT;
} else if (strcmp(mode, "reflect_and_transmit") == 0) {
options.allowed_scatter_mode =
mi::neuraylib::DF_FLAGS_ALLOW_REFLECT_AND_TRANSMIT;
} else {
std::cout << "Unknown allowed_scatter_mode: \"" << mode << "\"" << std::endl;
usage(argv[0]);
}
} else if (strcmp(opt, "-v") == 0 || strcmp(opt, "--version") == 0) {
print_version_and_exit = true;
} else {
if (strcmp(opt, "-h") != 0 && strcmp(opt, "--help") != 0)
std::cout << "Unknown option: \"" << opt << "\"" << std::endl;
usage(argv[0]);
}
}
else
options.material_names.push_back(std::string(opt));
}
if (options.mdl_test_type == 3 && !options.enable_pdf)
exit_failure("Cannot use \"mis + pdf\" test type when pdf is disabled.");
if (options.material_names.empty())
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_df");
if (!neuray.is_valid_interface())
exit_failure("Failed to load the SDK.");
if (print_version_and_exit) {
fprintf(stdout, "%s\n", version->get_string());
version = nullptr;
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
if (!mi::examples::mdl::configure(neuray.get(), configure_options))
exit_failure("Failed to initialize the SDK.");
if (ret != 0)
exit_failure("Failed to initialize the SDK. Result code: %d", ret);
LPE_state_machine lpe_state_machine;
lpe_state_machine.handle_to_global_tag("point_light");
lpe_state_machine.handle_to_global_tag("env");
lpe_state_machine.handle_to_global_tag("sphere");
lpe_state_machine.add_expression("Beauty", LPE::create_common(LPE::Common::Beauty));
lpe_state_machine.add_expression("Diffuse", LPE::create_common(LPE::Common::Diffuse));
lpe_state_machine.add_expression("Glossy", LPE::create_common(LPE::Common::Glossy));
lpe_state_machine.add_expression("Specular", LPE::create_common(LPE::Common::Specular));
lpe_state_machine.add_expression("SSS", LPE::create_common(LPE::Common::SSS));
lpe_state_machine.add_expression("Transmission", LPE::create_common(LPE::Common::Transmission));
lpe_state_machine.add_expression("Beauty-Env", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::light("env") }));
lpe_state_machine.add_expression("Beauty-PointLight", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::light("point_light") }));
lpe_state_machine.add_expression("Beauty-Emission", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter()),
LPE::emission() }));
lpe_state_machine.add_expression("Beauty-Base", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("base")),
LPE::light()}));
lpe_state_machine.add_expression("Beauty-Coat", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("coat")),
LPE::light()}));
lpe_state_machine.add_expression("Beauty-^Coat", LPE::sequence({
LPE::camera(),
LPE::zero_or_more(LPE::any_scatter("coat", false)),
LPE::any_light()}));
{
{
Material_compiler mc(
mdl_backend_api.get(),
mdl_factory.get(),
16,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
options.use_df_interpreter,
#endif
options.enable_derivatives,
options.fold_ternary_on_df,
options.enable_auxiliary_output,
options.enable_pdf,
options.use_adapt_normal,
options.enable_bsdf_flags,
"pointer",
"value");
std::vector<Df_cuda_material> material_bundle;
std::vector<mi::neuraylib::Target_function_description> descs;
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
descs.push_back(
Timing timing_compile("Compile MDL to PTX");
std::vector<std::string> used_material_names;
for (size_t i = 0; i < options.material_names.size(); ++i) {
std::string& opt_material_name = options.material_names[i];
std::string module_qualified_name, material_simple_name;
if (!mi::examples::mdl::parse_cmd_argument_material_name(
opt_material_name, module_qualified_name, material_simple_name, true))
exit_failure("Provided material name '%s' is invalid.",
opt_material_name.c_str());
if (!mi::examples::strings::ends_with(module_qualified_name, ".mdle") &&
opt_material_name.size() > 1 &&
opt_material_name.back() == '*') {
std::string pattern = opt_material_name.substr(0, opt_material_name.size() - 1);
if (!starts_with(pattern, "::"))
pattern = "::" + pattern;
std::string module_db_name = mc.load_module(module_qualified_name);
for (
mi::Size j = 0, n = loaded_module->get_material_count(); j < n; ++j) {
const char* material_db_name = loaded_module->get_material(j);
material_db_name));
std::string material_qualified_name = mat_def->get_mdl_name();
if (!mi::examples::strings::starts_with(material_qualified_name, pattern))
continue;
std::cout << "Adding material \"" << material_qualified_name << "\"" << std::endl;
check_success(mc.add_material(
module_qualified_name, material_db_name,
descs.data(), descs.size(),
options.use_class_compilation));
mc.get_compiled_materials().back());
material_bundle.push_back(create_cuda_material(
compiled_material.get(),
0, material_bundle.size(), descs,
contains_hair_bsdf(compiled_material.get())));
used_material_names.push_back(material_qualified_name);
}
} else {
std::string module_db_name = mc.load_module(module_qualified_name);
if (!module)
exit_failure("Failed to access the loaded module.");
std::string material_db_name
= module_db_name + "::" + material_simple_name;
material_db_name = mi::examples::mdl::add_missing_material_signature(
module.get(), material_db_name);
if (material_db_name.empty())
exit_failure("Failed to find the material %s in the module %s.",
material_simple_name.c_str(), module_qualified_name.c_str());
std::cout << "Adding material \"" << material_db_name << "\"" << std::endl;
check_success(mc.add_material(
module_qualified_name, material_db_name,
descs.data(), descs.size(),
options.use_class_compilation));
mc.get_compiled_materials().back());
material_bundle.push_back(create_cuda_material(
compiled_material.get(),
0, material_bundle.size(), descs,
contains_hair_bsdf(compiled_material.get())));
used_material_names.push_back(material_db_name);
}
}
options.material_names = used_material_names;
mc.generate_cuda_ptx());
timing_compile.stop();
for (auto& mat : material_bundle)
create_cuda_material_handles(mat, target_code.get(), lpe_state_machine);
lpe_state_machine.build();
render_scene(
options,
transaction,
factory,
image_api,
mdl_impexp_api,
target_code,
mc.get_material_defs(),
mc.get_compiled_materials(),
mc.get_argument_block_indices(),
material_bundle,
lpe_state_machine);
}
}
if (neuray->shutdown() != 0)
exit_failure("Failed to shutdown the SDK.");
neuray = nullptr;
if (!mi::examples::mdl::unload())
exit_failure("Failed to unload the SDK.");
exit_success();
}
COMMANDLINE_TO_UTF8
This interface represents bool.
Definition: inumber.h:122
This interface represents maps, i.e., a key-value based data structure.
Definition: imap.h:41
A wrapper around the interfaces for MDL annotations.
Definition: annotation_wrapper.h:37
A scene element that stores measured BSDF data.
Definition: ibsdf_measurement.h:39
This interface represents a compiled material.
Definition: icompiled_material.h:97
virtual const IExpression * lookup_sub_expression(const char *path) const =0
Looks up a sub-expression of the compiled material.
virtual const IValue * get_argument(Size index) const =0
Returns the value of an argument.
virtual Size get_parameter_count() const =0
Returns the number of parameters used by this compiled material.
virtual const IExpression_direct_call * get_body() const =0
Returns the body (or material root) of the compiled material.
virtual base::Uuid get_slot_hash(Material_slot slot) const =0
Returns the hash of a particular material slot.
virtual const char * get_parameter_name(Size index) const =0
Returns the name of a parameter.
This interface is used to interact with the distributed database.
Definition: idatabase.h:289
A constant expression.
Definition: iexpression.h:96
@ EK_CONSTANT
A constant expression. See mi::neuraylib::IExpression_constant.
Definition: iexpression.h:55
This API component allows the creation, assignment, and cloning of instances of types.
Definition: ifactory.h:35
This interface represents a function definition.
Definition: ifunction_definition.h:44
virtual const IAnnotation_list * get_parameter_annotations() const =0
Returns the annotations of all parameters.
virtual const char * get_mdl_name() const =0
Returns the MDL name of the function definition.
This interface provides various utilities related to canvases and buffers.
Definition: iimage_api.h:72
This interface represents a pixel image file.
Definition: iimage.h:66
This interface represents light profiles.
Definition: ilightprofile.h:73
This interface can be used to obtain the MDL backends.
Definition: imdl_backend_api.h:56
Factory for various MDL interfaces and functions.
Definition: imdl_factory.h:53
API component for MDL related import and export operations.
Definition: imdl_impexp_api.h:43
This interface represents an MDL module.
Definition: imodule.h:634
Represents target code of an MDL backend.
Definition: imdl_backend.h:783
Textures add image processing options to images.
Definition: itexture.h:68
virtual const base::IInterface * access(const char *name)=0
Retrieves an element from the database.
virtual base::IInterface * create(const char *type_name, Uint32 argc=0, const base::IInterface *argv[]=0)=0
Creates an object of the type type_name.
virtual Sint32 commit()=0
Commits the transaction.
The type of kind vector.
Definition: itype.h:399
@ TK_BOOL
The boolean type. See mi::neuraylib::IType_bool.
Definition: itype.h:158
@ TK_COLOR
The color type. See mi::neuraylib::IType_color.
Definition: itype.h:174
@ TK_VECTOR
A vector type. See mi::neuraylib::IType_vector.
Definition: itype.h:170
@ TK_INT
The integer type. See mi::neuraylib::IType_int.
Definition: itype.h:160
@ TK_FLOAT
The float type. See mi::neuraylib::IType_float.
Definition: itype.h:164
A value of type array.
Definition: ivalue.h:403
A value of type boolean.
Definition: ivalue.h:106
A value of type enum.
Definition: ivalue.h:144
A value of type vector.
Definition: ivalue.h:309
Kind
The possible kinds of values.
Definition: ivalue.h:36
@ VK_TEXTURE
A texture value. See mi::neuraylib::IValue_texture.
Definition: ivalue.h:62
@ VK_LIGHT_PROFILE
A light_profile value. See mi::neuraylib::IValue_light_profile.
Definition: ivalue.h:64
@ VK_ARRAY
An array value. See mi::neuraylib::IValue_array.
Definition: ivalue.h:56
@ VK_FLOAT
A float value. See mi::neuraylib::IValue_float.
Definition: ivalue.h:44
@ VK_BSDF_MEASUREMENT
A bsdf_measurement value. See mi::neuraylib::IValue_bsdf_measurement.
Definition: ivalue.h:66
@ VK_ENUM
An enum value. See mi::neuraylib::IValue_enum.
Definition: ivalue.h:42
@ VK_INT
An integer value. See mi::neuraylib::IValue_int.
Definition: ivalue.h:40
@ VK_VECTOR
A vector value. See mi::neuraylib::IValue_vector.
Definition: ivalue.h:50
@ VK_INVALID_DF
An invalid distribution function value. See mi::neuraylib::IValue_invalid_df.
Definition: ivalue.h:60
@ VK_BOOL
A boolean value. See mi::neuraylib::IValue_bool.
Definition: ivalue.h:38
@ VK_STRING
A string value. See mi::neuraylib::IValue_string.
Definition: ivalue.h:48
@ VK_COLOR
A color value. See mi::neuraylib::IValue_color.
Definition: ivalue.h:54
Abstract interface for accessing version information.
Definition: iversion.h:19
Interface * get() const
Access to the interface. Returns 0 for an invalid interface.
Definition: handle.h:294
std::basic_ostream<C, T> & info(std::basic_ostream<C, T> &ostream)
Manipulator for mi::base::Log_stream.
Definition: ilogger.h:580
unsigned int Uint32
32-bit unsigned integer.
Definition: types.h:49
Uint64 Size
Unsigned integral type that is large enough to hold the size of all types.
Definition: types.h:112
signed int Sint32
32-bit signed integer.
Definition: types.h:46
Float32 length(Float32 a)
Returns the Euclidean norm of the scalar a (its absolute value).
Definition: function.h:1107
Bbox<T, DIM> operator/(const Bbox<T, DIM> &bbox, T divisor)
Returns a bounding box that is a version of bbox divided by divisor, i.e., bbox.max and bbox....
Definition: bbox.h:518
Color acos(const Color &c)
Returns a color with the elementwise arc cosine of the color c.
Definition: color.h:477
Color atan2(const Color &c, const Color &d)
Returns a color with the elementwise arc tangent of the color c / d.
Definition: color.h:509
Color sin(const Color &c)
Returns a color with the elementwise sine of the color c.
Definition: color.h:761
Color cos(const Color &c)
Returns a color with the elementwise cosine of the color c.
Definition: color.h:558
Color pow(const Color &a, const Color &b)
Returns the color a elementwise to the power of b.
Definition: color.h:719
virtual const char * get_callable_function_df_handle(Size func_index, Size handle_index) const =0
Get the name of a distribution function handle referenced by a callable function.
Df_flags
Flags controlling the calculation of DF results.
Definition: target_code_types.h:761
virtual Size get_callable_function_df_handle_count(Size func_index) const =0
Get the number of distribution function handles referenced by a callable function.
@ DF_FLAGS_NONE
allows nothing -> black
Definition: target_code_types.h:762
@ SLOT_BACKFACE_EMISSION_INTENSITY
Slot "backface.emission.intensity".
Definition: icompiled_material.h:34
@ SLOT_BACKFACE_SCATTERING
Slot "backface.scattering".
Definition: icompiled_material.h:32
@ SLOT_SURFACE_EMISSION_INTENSITY
Slot "surface.emission.intensity".
Definition: icompiled_material.h:30
@ SLOT_SURFACE_SCATTERING
Slot "surface.scattering".
Definition: icompiled_material.h:28
@ SLOT_BACKFACE_EMISSION_EDF_EMISSION
Slot "backface.emission.emission".
Definition: icompiled_material.h:33
@ SLOT_SURFACE_EMISSION_EDF_EMISSION
Slot "surface.emission.emission".
Definition: icompiled_material.h:29
Common namespace for APIs of NVIDIA Advanced Rendering Center GmbH.
Definition: example_derivatives.dox:5
Description of target function.
Definition: imdl_backend.h:1764
Structure representing the state during traversal of the nested layout.
Definition: imdl_backend.h:693
Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.h
#ifndef EXAMPLE_DF_CUDA_H
#define EXAMPLE_DF_CUDA_H
#include <cstdint>
#include <vector_types.h>
#include <texture_types.h>
struct Target_code_data;
enum Mdl_test_type {
MDL_TEST_EVAL = 0,
MDL_TEST_SAMPLE = 1,
MDL_TEST_MIS = 2,
MDL_TEST_MIS_PDF = 3,
MDL_TEST_NO_ENV = 4,
MDL_TEST_COUNT
};
const unsigned MAX_DF_HANDLES = 8;
struct Env_accel {
unsigned int alias;
float q;
float pdf;
};
namespace
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
inline uint2 make_invalid()
{
uint2 index_pair;
index_pair.x = ~0;
index_pair.y = ~0;
return index_pair;
}
}
struct Df_cuda_material
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
Df_cuda_material()
: compiled_material_index(0)
, argument_block_index(~0)
, init(make_invalid())
, bsdf(make_invalid())
, edf(make_invalid())
, emission_intensity(make_invalid())
, backface_bsdf(make_invalid())
, backface_edf(make_invalid())
, backface_emission_intensity(make_invalid())
, volume_absorption(make_invalid())
, thin_walled(make_invalid())
, cutout_opacity(make_invalid())
, contains_hair_bsdf(0)
{
}
unsigned int compiled_material_index;
unsigned int argument_block_index;
uint2 init;
uint2 bsdf;
uint2 edf;
uint2 emission_intensity;
uint2 backface_bsdf;
uint2 backface_edf;
uint2 backface_emission_intensity;
uint2 volume_absorption;
uint2 thin_walled;
uint2 cutout_opacity;
unsigned int bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int bsdf_mtag_to_gtag_map_size;
unsigned int edf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int edf_mtag_to_gtag_map_size;
unsigned int backface_bsdf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int backface_bsdf_mtag_to_gtag_map_size;
unsigned int backface_edf_mtag_to_gtag_map[MAX_DF_HANDLES];
unsigned int backface_edf_mtag_to_gtag_map_size;
unsigned int contains_hair_bsdf;
};
enum Geometry_type
{
GT_SPHERE = 0,
GT_HAIR = 1,
};
struct Kernel_params {
uint2 resolution;
float exposure_scale;
unsigned int *display_buffer;
float3 *accum_buffer;
float3 *albedo_buffer;
float3 *normal_buffer;
bool enable_auxiliary_output;
unsigned display_buffer_index;
unsigned int iteration_start;
unsigned int iteration_num;
unsigned int mdl_test_type;
unsigned int max_path_length;
unsigned int use_derivatives;
unsigned int disable_aa;
float3 cam_pos;
float3 cam_dir;
float3 cam_right;
float3 cam_up;
float cam_focal;
unsigned int geometry;
uint2 env_size;
cudaTextureObject_t env_tex;
Env_accel *env_accel;
float env_intensity;
uint32_t env_gtag;
float env_rotation;
float3 light_pos;
float3 light_color;
float light_intensity;
uint32_t point_light_gtag;
Target_code_data *tc_data;
char const **arg_block_list;
unsigned int current_material;
Df_cuda_material *material_buffer;
uint32_t lpe_num_states;
uint32_t lpe_num_transitions;
uint32_t *lpe_state_table;
uint32_t *lpe_final_mask;
uint32_t default_gtag;
uint32_t lpe_ouput_expression;
};
enum Display_buffer_options
{
DISPLAY_BUFFER_LPE = 0,
DISPLAY_BUFFER_ALBEDO,
DISPLAY_BUFFER_NORMAL,
DISPLAY_BUFFER_COUNT
};
#endif
Types required for execution of generated native and CUDA code.
Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
struct Texture
{
explicit Texture()
: filtered_object(0)
, unfiltered_object(0)
, size(make_uint3(0, 0, 0))
, inv_size(make_float3(0.0f, 0.0f, 0.0f))
{}
explicit Texture(
cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, inv_angular_resolution(make_float2(0.0f, 0.0f))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution;
float2 inv_angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Texture_handler : Texture_handler_base {
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
#if defined(__CUDACC__)
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
\
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const crop_u[2],
float const crop_v[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const [2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
float const crop_u[2],
float const crop_v[2],
float const crop_w[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
float const crop_u[2],
float const crop_v[2],
float const crop_w[2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const [2],
float )
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
extern "C" __device__ void tex_resolution_3d(
int result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float )
{
Texture_handler const* self = static_cast<Texture_handler const*>(self_base);
if (texture_idx == 0 || texture_idx - 1 >= self->num_textures) {
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Texture const& tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
result[2] = tex.size.z;
}
extern "C" __device__ bool tex_texture_isvalid(
Texture_handler_base const *self_base,
unsigned texture_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return texture_idx != 0 && texture_idx - 1 < self->num_textures;
}
extern "C" __device__ void tex_frame(
int result[2],
Texture_handler_base const *,
unsigned )
{
result[0] = 0;
result[1] = 0;
}
extern "C" __device__ float df_light_profile_power(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.total_power;
}
extern "C" __device__ float df_light_profile_maximum(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
return lp.candela_multiplier;
}
extern "C" __device__ bool df_light_profile_isvalid(
Texture_handler_base const *self_base,
unsigned light_profile_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return light_profile_idx != 0 && light_profile_idx - 1 < self->num_lightprofiles;
}
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x * lp.inv_angular_resolution.x;
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
float v = phi * lp.theta_phi_inv_delta.y * lp.inv_angular_resolution.y;
u += 0.5f * lp.inv_angular_resolution.x;
v += 0.5f * lp.inv_angular_resolution.y;
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
extern "C" __device__ void df_light_profile_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const xi[3])
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
uint2 res = lp.angular_resolution;
if (res.x <= 2 || res.y <= 2)
return;
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data;
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0);
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1)
+ (idx_theta * (res.y - 1));
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1);
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned light_profile_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (light_profile_idx == 0 || light_profile_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[light_profile_idx - 1];
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
if (idx_theta < 0 || idx_theta > res.x - 2 || idx_phi < 0 || idx_phi > res.y - 2)
return 0.0f;
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1)
+ (idx_theta * (res.y - 1));
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ bool df_bsdf_measurement_isvalid(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
return bsdf_measurement_idx != 0 && bsdf_measurement_idx - 1 < self->num_mbsdfs;
}
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
result[0] = bm.angular_resolution[part_idx].x;
result[1] = bm.angular_resolution[part_idx].y;
result[2] = bm.num_channels[part_idx];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= float(M_ONE_OVER_PI);
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
{
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
{
store_result3(result, 0.0f);
return;
}
if (bm.num_channels[part_idx] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_idx], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_out[2],
float const xi[3],
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return;
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return;
uint2 res = bm.angular_resolution[part_idx];
const float* sample_data = bm.sample_data[part_idx];
if (res.x < 1 || res.y < 1)
return;
unsigned idx_theta_out = unsigned(theta_phi_out[0] * float(M_ONE_OVER_PI * 2.0f) * float(res.x));
idx_theta_out = min(idx_theta_out, res.x - 1);
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_out * res.x;
unsigned idx_theta_in = sample_cdf(cdf_theta, res.x, xi0);
float prob_theta = cdf_theta[idx_theta_in];
if (idx_theta_in > 0)
{
const float tmp = cdf_theta[idx_theta_in - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_out * res.x + idx_theta_in) * res.y;
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi = sample_cdf(cdf_phi, res.y, xi1);
float prob_phi = cdf_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 inv_res = bm.inv_angular_resolution[part_idx];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_in) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1];
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return 0.0f;
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return 0.0f;
const float* sample_data = bm.sample_data[part_idx];
uint2 res = bm.angular_resolution[part_idx];
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
unsigned idx_theta_in = unsigned(uvw.z * float(res.x));
unsigned idx_theta_out = unsigned(uvw.y * float(res.x));
unsigned idx_phi = unsigned(uvw.x * float(res.y));
idx_theta_in = min(idx_theta_in, res.x - 1);
idx_theta_out = min(idx_theta_out, res.x - 1);
idx_phi = min(idx_phi, res.y - 1);
const float* cdf_theta = sample_data + idx_theta_out * res.x;
float prob_theta = cdf_theta[idx_theta_in];
if (idx_theta_in > 0)
{
const float tmp = cdf_theta[idx_theta_in - 1];
prob_theta -= tmp;
}
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_out * res.x + idx_theta_in) * res.y;
float prob_phi = cdf_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_phi[idx_phi - 1];
prob_phi -= tmp;
}
float2 inv_res = bm.inv_angular_resolution[part_idx];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_in) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_in + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2],
Texture_handler const *self,
unsigned bsdf_measurement_idx,
float const theta_phi[2],
{
const Mbsdf& bm = self->mbsdfs[bsdf_measurement_idx - 1];
const unsigned part_idx = static_cast<unsigned>(part);
if (part_idx > 1 || bm.has_data[part_idx] == 0)
return;
const uint2 res = bm.angular_resolution[part_idx];
if (res.x < 1)
return;
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta = min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_idx][idx_theta];
result[1] = bm.max_albedo[part_idx];
}
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4],
Texture_handler_base const *self_base,
unsigned bsdf_measurement_idx,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (bsdf_measurement_idx == 0 || bsdf_measurement_idx - 1 >= self->num_mbsdfs)
return;
df_bsdf_measurement_albedo(
&result[0],
self,
bsdf_measurement_idx,
theta_phi,
df_bsdf_measurement_albedo(
&result[2],
self,
bsdf_measurement_idx,
theta_phi,
}
#ifndef TEX_SUPPORT_NO_DUMMY_ADAPTNORMAL
extern "C" __device__ void adapt_normal(
float result[3],
Texture_handler_base const *self_base,
float const normal[3])
{
result[0] = normal[0];
result[1] = normal[1];
result[2] = normal[2];
}
#endif
#ifndef TEX_SUPPORT_NO_DUMMY_SCENEDATA
extern "C" __device__ bool scene_data_isvalid(
Texture_handler_base const *self_base,
unsigned scene_data_id)
{
return false;
}
extern "C" __device__ void scene_data_lookup_float4(
float result[4],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[4],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
extern "C" __device__ void scene_data_lookup_float3(
float result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
extern "C" __device__ void scene_data_lookup_color(
float result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[3],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
extern "C" __device__ void scene_data_lookup_float2(
float result[2],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[2],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
}
extern "C" __device__ float scene_data_lookup_float(
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value,
bool uniform_lookup)
{
return default_value;
}
extern "C" __device__ void scene_data_lookup_int4(
int result[4],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[4],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
result[3] = default_value[3];
}
extern "C" __device__ void scene_data_lookup_int3(
int result[3],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[3],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
result[2] = default_value[2];
}
extern "C" __device__ void scene_data_lookup_int2(
int result[2],
Texture_handler_base const *self_base,
unsigned scene_data_id,
int const default_value[2],
bool uniform_lookup)
{
result[0] = default_value[0];
result[1] = default_value[1];
}
extern "C" __device__ int scene_data_lookup_int(
Texture_handler_base const *self_base,
unsigned scene_data_id,
int default_value,
bool uniform_lookup)
{
return default_value;
}
extern "C" __device__ void scene_data_lookup_float4x4(
float result[16],
Texture_handler_base const *self_base,
unsigned scene_data_id,
float const default_value[16],
bool uniform_lookup)
{
for (int i = 0; i < 16; ++i)
result[i] = default_value[i];
}
extern "C" __device__ void scene_data_lookup_deriv_float4(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_float3(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_color(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_float2(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
extern "C" __device__ void scene_data_lookup_deriv_float(
Texture_handler_base const *self_base,
unsigned scene_data_id,
bool uniform_lookup)
{
*result = *default_value;
}
#endif
#ifndef TEX_SUPPORT_NO_VTABLES
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
tex_frame,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
adapt_normal,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_float4x4,
};
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
tex_resolution_3d,
tex_texture_isvalid,
tex_frame,
df_light_profile_power,
df_light_profile_maximum,
df_light_profile_isvalid,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_isvalid,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos,
adapt_normal,
scene_data_isvalid,
scene_data_lookup_float,
scene_data_lookup_float2,
scene_data_lookup_float3,
scene_data_lookup_float4,
scene_data_lookup_int,
scene_data_lookup_int2,
scene_data_lookup_int3,
scene_data_lookup_int4,
scene_data_lookup_color,
scene_data_lookup_float4x4,
scene_data_lookup_deriv_float,
scene_data_lookup_deriv_float2,
scene_data_lookup_deriv_float3,
scene_data_lookup_deriv_float4,
scene_data_lookup_deriv_color,
};
#endif
#endif
#endif
Mbsdf_part
MBSDFs can consist of two parts, which can be selected using this enumeration.
Definition: target_code_types.h:328
tct_deriv< float[4]> tct_deriv_arr_float_4
A float[4] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:144
tct_traits<true>::tct_derivable_float tct_deriv_float
A float with derivatives.
Definition: target_code_types.h:126
struct Shading_state_material_impl<false> Shading_state_material
The MDL material state structure.
Definition: target_code_types.h:300
tct_deriv< float[2]> tct_deriv_arr_float_2
A float[2] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:138
tct_deriv< float[3]> tct_deriv_arr_float_3
A float[3] with derivatives (needed to avoid problems with wrong alignment).
Definition: target_code_types.h:141
@ MBSDF_DATA_TRANSMISSION
the bidirectional transmission distribution function (BTDF)
Definition: target_code_types.h:333
@ MBSDF_DATA_REFLECTION
the bidirectional reflection distribution function (BRDF)
Definition: target_code_types.h:330
The MDL material state structure inside the MDL SDK is a representation of the renderer state as defi...
Definition: target_code_types.h:210
The texture handler structure that is passed to the texturing functions.
Definition: target_code_types.h:712
The texture handler structure that is passed to the texturing functions with derivative support.
Definition: target_code_types.h:721
The runtime for bitmap texture access for the generated target code can optionally be implemented in ...
Definition: target_code_types.h:344
A template struct with derivatives.
Definition: target_code_types.h:97
Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include "example_shared.h"
#include "compiled_material_traverser_base.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
#include "utils/profiling.h"
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, inv_angular_resolution(make_float2(
1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y)))
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution;
float2 inv_angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures;
CUdeviceptr textures;
size_t num_mbsdfs;
CUdeviceptr mbsdfs;
size_t num_lightprofiles;
CUdeviceptr lightprofiles;
CUdeviceptr ro_data_segment;
};
template <typename T>
std::string to_string(T val)
{
std::ostringstream stream;
stream << val;
return stream.str();
}
class Handle_collector : public Compiled_material_traverser_base
{
public:
explicit Handle_collector(
: Compiled_material_traverser_base()
{
traverse(material, transaction);
}
const std::vector<std::string>& get_handles() const { return m_handles; }
private:
const Compiled_material_traverser_base::Traversal_element& element,
void* context) override
{
if (!element.expression ||
return;
>());
expr_dcall->get_arguments());
expr_dcall->get_definition()));
get_semantic();
if (semantic < mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_FIRST
|| semantic > mi::neuraylib::IFunction_definition::DS_INTRINSIC_DF_LAST)
return;
expr_dcall->get_arguments());
mi::Size arg_count = arguments->get_size();
const char* name = arguments->get_name(arg_count - 1);
if (strcmp(name, "handle") != 0)
return;
arguments->get_expression(arg_count - 1));
return;
return;
std::string handle_value = handle->get_value() ? std::string(handle->get_value()) : "";
if (std::find(m_handles.begin(), m_handles.end(), handle_value) == m_handles.end())
m_handles.push_back(handle_value);
}
std::vector<std::string> m_handles;
};
#ifdef ENABLE_DEPRECATED_UTILIY_FUNCTIONS
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
#else
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
cudaDeviceReset(); \
exit_failure( "Error in file %s, line %u: \"%s\".\n", __FILE__, __LINE__, #expr); \
} \
} while (false)
#endif
CUcontext init_cuda(
int ordinal
#ifdef OPENGL_INTEROP
, const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
{
check_cuda_success(cuDeviceGet(&cu_device, ordinal));
}
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
T &r = *it;
deleter(r);
}
}
C
const &
operator*()
const {
return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
template<typename T>
CUdeviceptr gpu_mem_dup(std::vector<T> const &data)
{
return gpu_mem_dup(&data[0], data.size() * sizeof(T));
}
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
m_target_argument_block_list->push_back(0);
}
bool prepare_target_code_data(
std::vector<size_t> const &arg_block_indices);
CUdeviceptr get_device_target_code_data_list();
CUdeviceptr get_device_target_argument_block_list();
CUdeviceptr get_device_target_argument_block(size_t i)
{
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
void update_device_argument_block(size_t i);
private:
bool prepare_texture(
std::vector<Texture> &textures);
bool prepare_mbsdf(
std::vector<Mbsdf> &mbsdfs);
bool prepare_lightprofile(
std::vector<Lightprofile> &lightprofiles);
bool m_enable_derivatives;
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
Resource_container<Target_code_data> m_target_code_data_list;
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
Resource_container<CUdeviceptr> m_target_argument_block_list;
std::vector<mi::base::Handle<mi::neuraylib::ITarget_argument_block> > m_own_arg_blocks;
std::vector<size_t> m_bsdf_arg_block_indices;
std::vector<mi::base::Handle<mi::neuraylib::ITarget_value_layout const> > m_arg_block_layouts;
Resource_container<Texture> m_all_textures;
Resource_container<Mbsdf> m_all_mbsdfs;
Resource_container<Lightprofile> m_all_lightprofiles;
Resource_container<cudaArray_t> m_all_texture_arrays;
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
{
check_cuda_success(cudaMemcpy2DToArray(
device_array, 0, 0, data,
cudaMemcpyHostToDevice));
}
bool Material_gpu_context::prepare_texture(
std::vector<Texture> &textures)
{
char const *image_type = image->get_type(0, 0);
if (image->is_uvtile() || image->is_animated()) {
std::cerr << "The example does not support uvtile and/or animated textures!" << std::endl;
return false;
}
if (texture->get_effective_gamma(0, 0) != 1.0f) {
image_api->
convert(canvas.get(),
"Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma(0, 0));
canvas = gamma_canvas;
} else if (strcmp(image_type, "Color") != 0 && strcmp(image_type, "Float32<4>") != 0) {
canvas = image_api->
convert(canvas.get(),
"Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
tex_layers != 6) {
std::cerr << "Invalid number of layers (" << tex_layers
<< "), cubemaps must have 6 layers!" << std::endl;
return false;
}
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
cudaArrayCubemap : 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0, sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
for (
mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(©_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
for (
mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.
get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
cudaTextureAddressMode addr_mode =
? cudaAddressModeClamp
: cudaAddressModeWrap;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f;
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
cudaTextureObject_t tex_obj_unfilt = 0;
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
textures.push_back(Texture(
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.back());
return true;
}
namespace
{
{
switch (part)
{
break;
break;
}
if (!dataset)
return true;
uint2 res;
res.x = dataset->get_resolution_theta();
res.y = dataset->get_resolution_phi();
mbsdf_cuda_representation.Add(part, res, num_channels);
const unsigned int cdf_theta_size = res.x * res.x;
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x];
float* sample_data_theta = sample_data;
float* sample_data_phi = sample_data + cdf_theta_size;
const float s_theta = (float) (M_PI * 0.5) / float(res.x);
const float s_phi = (float) (M_PI) / float(res.y);
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value = fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+ fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>()
: cudaCreateChannelDesc<float>());
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0, sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data),
res.y * lookup_channels * sizeof(float),
res.y,
res.x);
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(©_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
std::vector<Mbsdf> &mbsdfs)
{
Mbsdf mbsdf_cuda;
return false;
return false;
mbsdfs.push_back(mbsdf_cuda);
m_all_mbsdfs->push_back(mbsdfs.back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
std::vector<Lightprofile> &lightprofiles)
{
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
const float* data = lprof_nr->get_data();
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpy2DToArray(
device_lightprofile_data, 0, 0, data,
res.x * sizeof(float), res.x * sizeof(float), res.y, cudaMemcpyHostToDevice));
cudaResourceDesc res_desc;
memset(&res_desc, 0, sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0, sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
lightprofiles.push_back(lprof);
m_all_lightprofiles->push_back(lightprofiles.back());
return true;
}
bool Material_gpu_context::prepare_target_code_data(
std::vector<size_t> const &arg_block_indices)
{
check_success(m_device_target_code_data_list.get() == 0);
CUdeviceptr device_ro_data = 0;
device_ro_data = gpu_mem_dup(
}
CUdeviceptr device_textures = 0;
if (num_textures > 1) {
std::vector<Texture> textures;
for (
mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
device_textures = gpu_mem_dup(textures);
}
CUdeviceptr device_mbsdfs = 0;
if (num_mbsdfs > 1) {
std::vector<Mbsdf> mbsdfs;
for (
mi::Size i = 1; i < num_mbsdfs; ++i) {
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
CUdeviceptr device_lightprofiles = 0;
if (num_lightprofiles > 1) {
std::vector<Lightprofile> lightprofiles;
for (
mi::Size i = 1; i < num_lightprofiles; ++i) {
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_arg_block_layouts.push_back(
}
for (size_t arg_block_index : arg_block_indices) {
m_bsdf_arg_block_indices.push_back(arg_block_index);
}
return true;
}
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
class Material_compiler {
public:
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
bool enable_pdf,
bool use_adapt_normal,
bool enable_bsdf_flags,
const std::string& df_handle_mode,
const std::string& lambda_return_mode);
std::string load_module(const std::string& mdl_module_name);
bool add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* fname,
bool class_compilation=false);
bool add_material_df(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* base_fname,
bool class_compilation=false);
bool add_material(
const std::string& qualified_module_name,
const std::string& material_db_name,
bool class_compilation);
typedef std::vector<mi::base::Handle<mi::neuraylib::IFunction_definition const> >
Material_definition_list;
Material_definition_list const &get_material_defs()
{
return m_material_defs;
}
typedef std::vector<mi::base::Handle<mi::neuraylib::ICompiled_material const> >
Compiled_material_list;
Compiled_material_list const &get_compiled_materials()
{
return m_compiled_materials;
}
std::vector<size_t> const &get_argument_block_indices() const {
return m_arg_block_indexes;
}
const std::vector<std::string>& get_handles() const {
return m_handles;
}
private:
const std::string& qualified_module_name,
const std::string& material_db_name);
bool class_compilation);
private:
Material_definition_list m_material_defs;
Compiled_material_list m_compiled_materials;
std::vector<size_t> m_arg_block_indexes;
std::vector<std::string> m_handles;
};
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives,
bool fold_ternary_on_df,
bool enable_auxiliary,
bool enable_pdf,
bool use_adapt_normal,
bool enable_bsdf_flags,
const std::string& df_handle_mode,
const std::string& lambda_return_mode)
, m_be_cuda_ptx(mdl_backend_api->get_backend(
mi::neuraylib::IMdl_backend_api:
:MB_CUDA_PTX))
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
if (enable_derivatives) {
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
if (enable_auxiliary) {
check_success(m_be_cuda_ptx->set_option("enable_auxiliary", "on") == 0);
}
if (!enable_pdf) {
check_success(m_be_cuda_ptx->set_option("enable_pdf", "off") == 0);
}
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
check_success(m_be_cuda_ptx->set_option("df_handle_slot_mode", df_handle_mode.c_str()) == 0);
check_success(m_be_cuda_ptx->set_option("lambda_return_mode", lambda_return_mode.c_str()) == 0);
if (use_adapt_normal) {
check_success(m_be_cuda_ptx->set_option("use_renderer_adapt_normal", "on") == 0);
}
if (enable_bsdf_flags) {
check_success(m_be_cuda_ptx->set_option("libbsdf_flags_in_bsdf_data", "on") == 0);
}
m_context->set_option("experimental", true);
m_context->set_option("fold_ternary_on_df", fold_ternary_on_df);
}
std::string Material_compiler::load_module(const std::string& mdl_module_name)
{
m_mdl_impexp_api->load_module(m_transaction.get(), mdl_module_name.c_str(), m_context.get());
if (!print_messages(m_context.get()))
exit_failure("Failed to load module: %s", mdl_module_name.c_str());
m_mdl_factory->get_db_module_name(mdl_module_name.c_str()));
return db_module_name->get_c_str();
}
const std::string& qualified_module_name,
const std::string& material_db_name)
{
material_db_name.c_str()));
if (!material_definition) {
print_message(
(
"Material '" +
material_db_name +
"' does not exist in '" +
qualified_module_name + "'").c_str());
return nullptr;
}
m_material_defs.push_back(material_definition);
material_definition->create_function_call(0, &result));
check_success(result == 0);
material_instance->retain();
return material_instance.get();
}
bool class_compilation)
{
m_mdl_factory->create_type_factory(m_transaction.get()));
m_context->set_option("target_type", standard_material_type.get());
material_instance2->create_compiled_material(flags, m_context.get()));
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
compiled_material->retain();
return compiled_material.get();
}
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
FILE *file = fopen("target_code.ptx", "wt");
if (file)
{
fwrite(code_cuda_ptx->get_code(), code_cuda_ptx->get_code_size(), 1, file);
fclose(file);
}
#endif
return code_cuda_ptx;
}
bool Material_compiler::add_material_subexpr(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* fname,
bool class_compilation)
{
add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
}
bool Material_compiler::add_material_df(
const std::string& qualified_module_name,
const std::string& material_db_name,
const char* path,
const char* base_fname,
bool class_compilation)
{
add_material(qualified_module_name, material_db_name, &desc, 1, class_compilation);
}
bool Material_compiler::add_material(
const std::string& qualified_module_name,
const std::string& material_db_name,
bool class_compilation)
{
if (description_count == 0)
return false;
create_material_instance(qualified_module_name, material_db_name));
if (!material_instance)
return false;
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
m_arg_block_indexes.push_back(function_descriptions[0].argument_block_index);
return print_messages(m_context.get());
}
void print_array_u32(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
str += "1] = { 0 };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
void print_array_func(
std::string &str, std::string const &name, unsigned count, std::string const &content)
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
str += "1] = { dummy_func };\n";
} else {
str += to_string(count) + "] = { " + content + " };\n";
}
}
std::string generate_func_array_ptx(
{
std::string src =
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
src += ".func dummy_func() { ret; }\n";
std::string tc_offsets;
std::string function_names;
std::string tc_indices;
std::string ab_indices;
unsigned f_count = 0;
for (size_t tc_index = 0, num = target_codes.size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
if(!tc_offsets.empty())
tc_offsets += ", ";
tc_offsets += to_string(f_count);
for (size_t func_index = 0, func_count = target_code->get_callable_function_count();
func_index < func_count; ++func_index)
{
if (!tc_indices.empty())
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
tc_indices += to_string(tc_index);
function_names += target_code->get_callable_function(func_index);
mi::Size ab_index = target_code->get_callable_function_argument_block_index(func_index);
ab_indices += to_string(ab_index ==
mi::Size(~0) ? 0 : (ab_index + 1));
f_count++;
src += target_code->get_callable_function_prototype(
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
src += std::string(".visible .const .align 4 .u32 mdl_target_code_count = ")
+ to_string(target_codes.size()) + ";\n";
print_array_u32(
src, std::string("mdl_target_code_offsets"), unsigned(target_codes.size()), tc_offsets);
src += std::string(".visible .const .align 4 .u32 mdl_functions_count = ")
+ to_string(f_count) + ";\n";
print_array_func(src, std::string("mdl_functions"), f_count, function_names);
print_array_u32(src, std::string("mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src, std::string("mdl_target_code_indices"), f_count, tc_indices);
return src;
}
CUmodule build_linked_kernel(
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
FILE *file = fopen("func_array.ptx", "wt");
if (file)
{
fwrite(ptx_func_array_src.c_str(), ptx_func_array_src.size(), 1, file);
fclose(file);
}
#endif
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
mi::examples::profiling::Timing timing("PTX to SASS");
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
for (size_t i = 0, num_target_codes = target_codes.size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.c_str()),
ptx_func_array_src.size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
std::cerr << "PTX linker error:\n" << error_log << std::endl;
check_cuda_success(link_result);
}
timing.stop();
std::cout << "CUDA link completed." << std::endl;
if (info_log[0])
std::cout << "Linker output:\n" << info_log << std::endl;
#ifdef DUMP_PTX
file = fopen("target_code.cubin", "wb");
if (file)
{
fwrite(linked_cubin, linked_cubin_size, 1, file);
fclose(file);
}
#endif
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(®s, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
std::cout << "Kernel uses " << regs << " registers and " << lmem << " lmem and has a size of "
<< linked_cubin_size << " bytes." << std::endl;
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif
This interface represents mutable pointers.
Definition: ipointer.h:43
Example implementation of the abstract interface mi::neuraylib::IBsdf_isotropic_data.
Definition: bsdf_isotropic_data.h:60
virtual const base::IInterface * get_reflection() const =0
Returns the BSDF data for the reflection.
virtual const base::IInterface * get_transmission() const =0
Returns the BSDF data for transmission.
virtual Uint32 get_layers_size() const =0
Returns the number of layers this canvas has.
virtual Uint32 get_resolution_y() const =0
Returns the resolution of the canvas in y direction.
virtual Uint32 get_resolution_x() const =0
Returns the resolution of the canvas in x direction.
Abstract interface for a canvas represented by a rectangular array of tiles.
Definition: icanvas.h:89
virtual const ITile * get_tile(Uint32 layer=0) const =0
Returns the tile for the given layer.
A direct call expression.
Definition: iexpression.h:241
@ EK_DIRECT_CALL
A direct call expression. See mi::neuraylib::IExpression_direct_call.
Definition: iexpression.h:61
This interface represents a function call.
Definition: ifunction_call.h:52
Semantics
All known semantics of functions definitions.
Definition: ifunction_definition.h:54
virtual ITile * convert(const ITile *tile, const char *pixel_type) const =0
Converts a tile to a different pixel type.
virtual void adjust_gamma(ITile *tile, Float32 old_gamma, Float32 new_gamma) const =0
Sets the gamma value of a tile and adjusts the pixel data accordingly.
virtual IArray * create_mipmap(const ICanvas *canvas, Float32 gamma_override=0.0f) const =0
Creates a mipmap from the given canvas.
This interface represents a material instance.
Definition: imaterial_instance.h:34
@ CLASS_COMPILATION
Selects class compilation instead of instance compilation.
Definition: imaterial_instance.h:41
@ DEFAULT_OPTIONS
Default compilation options (e.g., instance compilation).
Definition: imaterial_instance.h:40
@ MSG_COMPILER_DAG
MDL Core DAG generator message.
Definition: imdl_execution_context.h:40
A transaction provides a consistent view on the database.
Definition: itransaction.h:82
@ SID_MATERIAL
The "::material" struct type.
Definition: itype.h:484
A value of type string.
Definition: ivalue.h:221
virtual const IInterface * get_interface(const Uuid &interface_id) const =0
Acquires a const interface from another.
static const Dup_interface DUP_INTERFACE
Symbolic constant to trigger a special constructor in the Handle class.
Definition: handle.h:37
Handle<Interface> make_handle(Interface *iptr)
Returns a handle that holds the interface pointer passed in as argument.
Definition: handle.h:428
@ MESSAGE_SEVERITY_ERROR
An error has occurred.
Definition: enums.h:35
float Float32
32-bit float.
Definition: types.h:51
Bbox<T, DIM> operator*(const Bbox<T, DIM> &bbox, T factor)
Returns a bounding box that is a version of bbox scaled by factor, i.e., bbox.max and bbox....
Definition: bbox.h:502
virtual const char * get_texture(Size index) const =0
Returns the name of a texture resource used by the target code.
const char * path
The path from the material root to the expression that should be translated, e.g.,...
Definition: imdl_backend.h:1779
virtual Size get_bsdf_measurement_count() const =0
Returns the number of bsdf measurement resources used by the target code.
virtual Texture_shape get_texture_shape(Size index) const =0
Returns the texture shape of a given texture resource used by the target code.
virtual Size get_ro_data_segment_count() const =0
Returns the number of constant data initializers.
virtual Size get_ro_data_segment_size(Size index) const =0
Returns the size of the constant data segment at the given index.
virtual const char * get_ro_data_segment_data(Size index) const =0
Returns the data of the constant data segment at the given index.
virtual Size get_texture_count() const =0
Returns the number of texture resources used by the target code.
virtual Size get_argument_block_count() const =0
Returns the number of target argument blocks.
Texture_shape
Definition: imdl_backend.h:811
virtual const ITarget_argument_block * get_argument_block(Size index) const =0
Get a target argument block if available.
const char * base_fname
The base name of the generated functions.
Definition: imdl_backend.h:1785
virtual const ITarget_value_layout * get_argument_block_layout(Size index) const =0
Get a captured arguments block layout if available.
Sint32 return_code
A return code.
Definition: imdl_backend.h:1838
virtual const char * get_light_profile(Size index) const =0
Returns the name of a light profile resource used by the target code.
virtual const char * get_bsdf_measurement(Size index) const =0
Returns the name of a bsdf measurement resource used by the target code.
virtual Size get_light_profile_count() const =0
Returns the number of light profile resources used by the target code.
@ Texture_shape_bsdf_data
Three-dimensional texture representing a BSDF data table.
Definition: imdl_backend.h:817
@ Texture_shape_cube
Cube map texture.
Definition: imdl_backend.h:815
@ Texture_shape_3d
Three-dimensional texture.
Definition: imdl_backend.h:814
@ BSDF_SCALAR
One scalar per grid value.
Definition: ibsdf_isotropic_data.h:24
Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cu
/******************************************************************************
* Copyright 2024 NVIDIA Corporation. All rights reserved.
*****************************************************************************/
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#define _USE_MATH_DEFINES
#include <math.h>
#include "example_df_cuda.h"
#include "texture_support_cuda.h"
// To reuse this sample code for the MDL SDK and MDL Core the corresponding namespaces are used.
// when this CUDA code is used in the context of an SDK sample.
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
// when this CUDA code is used in the context of an Core sample.
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
// for LPE support there different options for the renderer, for CUDA a renderer provided buffer
// can be used to retrieve the contributions of the individual handles (named lobes)
// Note, a real renderer would go for one specific option only
#define DF_HSM_POINTER -2
#define DF_HSM_NONE -1
#define DF_HSM_FIXED_1 1
#define DF_HSM_FIXED_2 2
#define DF_HSM_FIXED_4 4
#define DF_HSM_FIXED_8 8
// this is the one that is used,
// Note, this has to match with code backend option "df_handle_slot_mode"
#define DF_HANDLE_SLOTS DF_HSM_POINTER
// If enabled, math::DX(state::texture_coordinates(0).xy) = float2(1, 0) and
// math::DY(state::texture_coordinates(0).xy) = float2(0, 1) will be used.
// #define USE_FAKE_DERIVATIVES
#ifdef ENABLE_DERIVATIVES
typedef Material_expr_function_with_derivs Mat_expr_func;
typedef Bsdf_init_function_with_derivs Bsdf_init_func;
typedef Bsdf_sample_function_with_derivs Bsdf_sample_func;
typedef Bsdf_evaluate_function_with_derivs Bsdf_evaluate_func;
typedef Bsdf_pdf_function_with_derivs Bsdf_pdf_func;
typedef Bsdf_auxiliary_function_with_derivs Bsdf_auxiliary_func;
typedef Edf_init_function_with_derivs Edf_init_func;
typedef Edf_sample_function_with_derivs Edf_sample_func;
typedef Edf_evaluate_function_with_derivs Edf_evaluate_func;
typedef Edf_pdf_function_with_derivs Edf_pdf_func;
typedef Edf_auxiliary_function_with_derivs Edf_auxiliary_func;
typedef Shading_state_material_with_derivs Mdl_state;
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Material_expr_function Mat_expr_func;
typedef Bsdf_init_function Bsdf_init_func;
typedef Bsdf_sample_function Bsdf_sample_func;
typedef Bsdf_evaluate_function Bsdf_evaluate_func;
typedef Bsdf_pdf_function Bsdf_pdf_func;
typedef Bsdf_auxiliary_function Bsdf_auxiliary_func;
typedef Edf_init_function Edf_init_func;
typedef Edf_sample_function Edf_sample_func;
typedef Edf_evaluate_function Edf_evaluate_func;
typedef Edf_pdf_function Edf_pdf_func;
typedef Edf_auxiliary_function Edf_auxiliary_func;
typedef Shading_state_material Mdl_state;
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
// Custom structure representing the resources used by the generated code of a target code object.
struct Target_code_data
{
size_t num_textures; // number of elements in the textures field
Texture *textures; // a list of Texture objects, if used
size_t num_mbsdfs; // number of elements in the mbsdfs field
Mbsdf *mbsdfs; // a list of mbsdfs objects, if used
size_t num_lightprofiles; // number of elements in the lightprofiles field
Lightprofile *lightprofiles; // a list of lightprofiles objects, if used
char const *ro_data_segment; // the read-only data segment, if used
};
// all function types
union Mdl_function_ptr
{
Mat_expr_func *expression;
Bsdf_init_func *bsdf_init;
Bsdf_sample_func *bsdf_sample;
Bsdf_evaluate_func *bsdf_evaluate;
Bsdf_pdf_func *bsdf_pdf;
Bsdf_auxiliary_func *bsdf_auxiliary;
Edf_init_func *edf_init;
Edf_sample_func *edf_sample;
Edf_evaluate_func *edf_evaluate;
Edf_pdf_func *edf_pdf;
Edf_auxiliary_func *edf_auxiliary;
};
// function index offset depending on the target code
extern __constant__ unsigned int mdl_target_code_offsets[];
// number of generated functions
extern __constant__ unsigned int mdl_functions_count;
// the following arrays are indexed by an mdl_function_index
extern __constant__ Mdl_function_ptr mdl_functions[];
extern __constant__ unsigned int mdl_arg_block_indices[];
// Identity matrix.
// The last row is always implied to be (0, 0, 0, 1).
__constant__ const float4 identity[3] = {
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
// the material provides pairs for each generated function to evaluate
// the functions and arg blocks array are indexed by:
// mdl_target_code_offsets[target_code_index] + function_index
typedef uint3 Mdl_function_index;
__device__ inline Mdl_function_index get_mdl_function_index(const uint2& index_pair)
{
return make_uint3(
index_pair.x, // target_code_index
index_pair.y, // function_index inside target code
mdl_target_code_offsets[index_pair.x] + index_pair.y); // global function index
}
// resource handler for accessing textures and other data
// depends on the target code (link unit)
struct Mdl_resource_handler
{
__device__ Mdl_resource_handler()
{
m_tex_handler.vtable = &TEX_VTABLE; // only required in 'vtable' mode, otherwise NULL
data.shared_data = NULL;
data.texture_handler = reinterpret_cast<Texture_handler_base *>(&m_tex_handler);
}
// reuse the handler with a different target code index
__device__ inline void set_target_code_index(
const Kernel_params& params, const Mdl_function_index& index)
{
m_tex_handler.num_textures = params.tc_data[index.x].num_textures;
m_tex_handler.textures = params.tc_data[index.x].textures;
m_tex_handler.num_mbsdfs = params.tc_data[index.x].num_mbsdfs;
m_tex_handler.mbsdfs = params.tc_data[index.x].mbsdfs;
m_tex_handler.num_lightprofiles = params.tc_data[index.x].num_lightprofiles;
m_tex_handler.lightprofiles = params.tc_data[index.x].lightprofiles;
}
// a pointer to this data is passed to all generated functions
Resource_data data;
private:
Tex_handler m_tex_handler;
};
// checks if the indexed function can be evaluated or not
__device__ inline bool is_valid(const Mdl_function_index& index)
{
return index.y != 0xFFFFFFFFu;
}
// get a pointer to the material parameters which is passed to all generated functions
__device__ inline const char* get_arg_block(
const Kernel_params& params,
const Mdl_function_index& index)
{
return params.arg_block_list[mdl_arg_block_indices[index.z]];
}
// Init function
__device__ inline Bsdf_init_func* as_init(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].bsdf_init;
}
// Expression functions
__device__ inline Mat_expr_func* as_expression(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].expression;
}
// Expression functions
#ifdef ENABLE_DERIVATIVES
template <typename T>
__device__ inline typename Material_function<T>::Type_with_derivs *as_expression_typed(
const Mdl_function_index& index)
{
return reinterpret_cast<typename Material_function<T>::Type_with_derivs *>(
mdl_functions[index.z + 0].expression);
}
#else
template <typename T>
__device__ inline typename Material_function<T>::Type *as_expression_typed(
const Mdl_function_index& index)
{
return reinterpret_cast<typename Material_function<T>::Type *>(
mdl_functions[index.z + 0].expression);
}
#endif
// BSDF functions
__device__ inline Bsdf_sample_func* as_bsdf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].bsdf_sample;
}
__device__ inline Bsdf_evaluate_func* as_bsdf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].bsdf_evaluate;
}
__device__ inline Bsdf_pdf_func* as_bsdf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].bsdf_pdf;
}
__device__ inline Bsdf_auxiliary_func* as_bsdf_auxiliary(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].bsdf_auxiliary;
}
// EDF functions
__device__ inline Edf_sample_func* as_edf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].edf_sample;
}
__device__ inline Edf_evaluate_func* as_edf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].edf_evaluate;
}
__device__ inline Edf_pdf_func* as_edf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].edf_pdf;
}
__device__ inline Edf_auxiliary_func* as_edf_auxiliary(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].edf_auxiliary;
}
// 3d vector math utilities
__device__ inline float3 operator-(const float3& a)
{
return make_float3(-a.x, -a.y, -a.z);
}
__device__ inline float3 operator+(const float3& a, const float3& b)
{
return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
__device__ inline float3 operator-(const float3& a, const float3& b)
{
return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
__device__ inline float3 operator*(const float3& a, const float3& b)
{
return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
__device__ inline float3 operator*(const float3& a, const float s)
{
return make_float3(a.x * s, a.y * s, a.z * s);
}
__device__ inline float3 operator/(const float3& a, const float s)
{
return make_float3(a.x / s, a.y / s, a.z / s);
}
__device__ inline void operator+=(float3& a, const float3& b)
{
a.x += b.x; a.y += b.y; a.z += b.z;
}
__device__ inline void operator-=(float3& a, const float3& b)
{
a.x -= b.x; a.y -= b.y; a.z -= b.z;
}
__device__ inline void operator*=(float3& a, const float3& b)
{
a.x *= b.x; a.y *= b.y; a.z *= b.z;
}
__device__ inline void operator*=(float3& a, const float& s)
{
a.x *= s; a.y *= s; a.z *= s;
}
__device__ inline float squared_length(const float3 &d)
{
return d.x * d.x + d.y * d.y + d.z * d.z;
}
__device__ inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f / sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
__device__ inline float dot(const float3 &u, const float3 &v)
{
return u.x * v.x + u.y * v.y + u.z * v.z;
}
__device__ inline float3 cross(const float3 &u, const float3 &v)
{
return make_float3(
u.y * v.z - u.z * v.y,
u.z * v.x - u.x * v.z,
u.x * v.y - u.y * v.x);
}
// Random number generator based on the OptiX SDK
template<uint32_t N>
static __forceinline__ __device__ uint32_t tea(uint32_t v0, uint32_t v1)
{
uint32_t s0 = 0;
for (uint32_t n = 0; n < N; n++)
{
s0 += 0x9e3779b9;
v0 += ((v1 << 4) + 0xa341316c) ^ (v1 + s0) ^ ((v1 >> 5) + 0xc8013ea4);
v1 += ((v0 << 4) + 0xad90777d) ^ (v0 + s0) ^ ((v0 >> 5) + 0x7e95761e);
}
return v0;
}
// Generate random uint32_t in [0, 2^24)
static __forceinline__ __device__ uint32_t lcg(uint32_t& prev)
{
const uint32_t LCG_A = 1664525u;
const uint32_t LCG_C = 1013904223u;
prev = (LCG_A * prev + LCG_C);
return prev & 0x00FFFFFF;
}
// Generate random float in [0, 1)
static __forceinline__ __device__ float rnd(uint32_t& prev)
{
return ((float)lcg(prev) / (float)0x01000000);
}
// direction to environment map texture coordinates
__device__ inline float2 environment_coords(const float3 &dir, const Kernel_params& params)
{
const float u = atan2f(dir.z, dir.x) * (float)(0.5 / M_PI) + 0.5f;
const float v = acosf(fmax(fminf(-dir.y, 1.0f), -1.0f)) * (float)(1.0 / M_PI);
return make_float2(fmodf(u + params.env_rotation * (float)(0.5 * M_1_PI), 1.0f), v);
}
// importance sample the environment
__device__ inline float3 environment_sample(
float3 &dir,
float &pdf,
const float3 &xi,
const Kernel_params ¶ms)
{
// importance sample an envmap pixel using an alias map
const unsigned int size = params.env_size.x * params.env_size.y;
const unsigned int idx = min((unsigned int)(xi.x * (float)size), size - 1);
unsigned int env_idx;
float xi_y = xi.y;
if (xi_y < params.env_accel[idx].q) {
env_idx = idx ;
xi_y /= params.env_accel[idx].q;
} else {
env_idx = params.env_accel[idx].alias;
xi_y = (xi_y - params.env_accel[idx].q) / (1.0f - params.env_accel[idx].q);
}
const unsigned int py = env_idx / params.env_size.x;
const unsigned int px = env_idx % params.env_size.x;
pdf = params.env_accel[env_idx].pdf;
// uniformly sample spherical area of pixel
const float u = (float)(px + xi_y) / (float)params.env_size.x;
const float phi = u * (float)(2.0 * M_PI) - (float)M_PI - params.env_rotation;
float sin_phi, cos_phi;
sincosf(phi > float(-M_PI) ? phi : (phi + (float)(2.0 * M_PI)), &sin_phi, &cos_phi);
const float step_theta = (float)M_PI / (float)params.env_size.y;
const float theta0 = (float)(py) * step_theta;
const float cos_theta = cosf(theta0) * (1.0f - xi.z) + cosf(theta0 + step_theta) * xi.z;
const float theta = acosf(cos_theta);
const float sin_theta = sinf(theta);
dir = make_float3(cos_phi * sin_theta, -cos_theta, sin_phi * sin_theta);
// lookup filtered beauty
const float v = theta * (float)(1.0 / M_PI);
const float4 t = tex2D<float4>(params.env_tex, u, v);
return make_float3(t.x, t.y, t.z) * params.env_intensity / pdf;
}
// evaluate the environment
__device__ inline float3 environment_eval(
float &pdf,
const float3 &dir,
const Kernel_params ¶ms)
{
const float2 uv = environment_coords(dir, params);
const unsigned int x =
min((unsigned int)(uv.x * (float)params.env_size.x), params.env_size.x - 1);
const unsigned int y =
min((unsigned int)(uv.y * (float)params.env_size.y), params.env_size.y - 1);
pdf = params.env_accel[y * params.env_size.x + x].pdf;
const float4 t = tex2D<float4>(params.env_tex, uv.x, uv.y) ;
return make_float3(t.x, t.y, t.z) * params.env_intensity;
}
//-------------------------------------------------------------------------------------------------
struct auxiliary_data
{
float3 albedo;
float3 normal;
int num; // multiple elements can contribute to the aux buffer with equal weight
__device__ inline auxiliary_data& operator+=(const auxiliary_data& b)
{
albedo += b.albedo;
normal += b.normal;
num += b.num;
return *this;
}
};
__device__ inline static void clear(auxiliary_data& data)
{
data.albedo = make_float3(0.0f, 0.0f, 0.0f);
data.normal = make_float3(0.0f, 0.0f, 0.0f);
data.num = 0;
}
__device__ inline void normalize(auxiliary_data& data)
{
data.albedo = data.albedo / fmaxf(1.0f, float(data.num));
if (dot(data.normal, data.normal) > 0.0f)
data.normal = normalize(data.normal);
data.num = min(1, data.num);
}
//-------------------------------------------------------------------------------------------------
struct Ray_state
{
float3 contribution;
float3 weight;
float3 pos, pos_rx, pos_ry;
float3 dir, dir_rx, dir_ry;
bool inside;
bool inside_cutout;
int intersection;
uint32_t lpe_current_state;
auxiliary_data* aux;
};
struct Ray_hit_info
{
float distance;
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 position;
#else
float3 position;
#endif
float3 normal;
float3 tangent_u;
float3 tangent_v;
#ifdef ENABLE_DERIVATIVES
tct_deriv_float3 texture_coords[1];
#else
tct_float3 texture_coords[1];
#endif
};
#define GT_SPHERE_RADIUS 1.0f
__device__ inline bool intersect_sphere(
const Ray_state &ray_state,
const Kernel_params ¶ms,
Ray_hit_info& out_hit)
{
const float r = GT_SPHERE_RADIUS;
const float b = 2.0f * dot(ray_state.dir, ray_state.pos);
const float c = dot(ray_state.pos, ray_state.pos) - r * r;
float tmp = b * b - 4.0f * c;
if (tmp < 0.0f)
return false;
tmp = sqrtf(tmp);
const float t0 = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t1 = c / t0;
const float m = fminf(t0, t1);
out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
if (out_hit.distance < 0.0f)
return false;
// compute geometry state
#ifdef ENABLE_DERIVATIVES
out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
const float3 &posval = out_hit.position.val;
#else
out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
const float3 &posval = out_hit.position;
#endif
out_hit.normal = normalize(posval);
const float phi = atan2f(out_hit.normal.x, out_hit.normal.z);
const float theta = acosf(out_hit.normal.y);
const float3 uvw = make_float3(
(phi * (float) (0.5 / M_PI) + 0.5f) * 2.0f,
1.0f - theta * (float) (1.0 / M_PI),
0.0f);
// compute surface derivatives
float sp, cp;
sincosf(phi, &sp, &cp);
const float st = sinf(theta);
out_hit.tangent_u = make_float3(cp * st, 0.0f, -sp * st) * (float) M_PI * r;
out_hit.tangent_v = make_float3(sp * out_hit.normal.y, -st, cp * out_hit.normal.y) * (float) (-M_PI) * r;
#ifdef ENABLE_DERIVATIVES
out_hit.texture_coords[0].val = uvw;
out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
#else
out_hit.texture_coords[0] = uvw;
#endif
return true;
}
#define GT_HAIR_RADIUS 0.35f
#define GT_HAIR_LENGTH 3.0f
__device__ inline bool intersect_hair(
const Ray_state &ray_state,
const Kernel_params ¶ms,
Ray_hit_info& out_hit)
{
const float r = GT_HAIR_RADIUS;
const float a = ray_state.dir.x * ray_state.dir.x + ray_state.dir.z * ray_state.dir.z;
const float b = 2.0f * (ray_state.dir.x * ray_state.pos.x + ray_state.dir.z * ray_state.pos.z);
const float c = ray_state.pos.x * ray_state.pos.x + ray_state.pos.z * ray_state.pos.z - r * r;
float tmp = b * b - 4.0f * a * c;
if (tmp < 0.0f)
return false;
tmp = sqrtf(tmp);
const float q = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t0 = q / a;
const float t1 = c / q;
const float m = fminf(t0, t1);
out_hit.distance = m > 0.0f ? m : fmaxf(t0, t1);
if (out_hit.distance < 0.0f)
return false;
// compute geometry state
#ifdef ENABLE_DERIVATIVES
out_hit.position.val = ray_state.pos + ray_state.dir * out_hit.distance;
out_hit.position.dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 0.0f, 0.0f);
const float3 &posval = out_hit.position.val;
#else
out_hit.position = ray_state.pos + ray_state.dir * out_hit.distance;
const float3 &posval = out_hit.position;
#endif
out_hit.normal = normalize(make_float3(posval.x, 0.0f, posval.z));
if (fabsf(posval.y) > GT_HAIR_LENGTH * 0.5f)
return false;
const float phi = atan2f(posval.z, posval.x);
const float3 uvw = make_float3(
(posval.y + GT_HAIR_LENGTH * 0.5f) / GT_HAIR_LENGTH, // position along the hair
phi * (float) (0.5f / M_PI) + 0.5f, // position around the hair in the range [0, 1]
2.0f * GT_HAIR_RADIUS); // thickness of the hair
// compute surface derivatives
out_hit.tangent_u = make_float3(0.0, 1.0, 0.0);
out_hit.tangent_v = cross(out_hit.normal, out_hit.tangent_u);
#ifdef ENABLE_DERIVATIVES
out_hit.texture_coords[0].val = uvw;
out_hit.texture_coords[0].dx = make_float3(0.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 0.0f, 0.0f);
#else
out_hit.texture_coords[0] = uvw;
#endif
return true;
}
__device__ inline bool intersect_geometry(
Ray_state &ray_state,
const Kernel_params ¶ms,
Ray_hit_info& out_hit)
{
switch (Geometry_type(params.geometry))
{
case GT_SPHERE:
if (!intersect_sphere(ray_state, params, out_hit))
return false;
break;
case GT_HAIR:
if (!intersect_hair(ray_state, params, out_hit))
return false;
break;
default:
return false;
}
#ifndef ENABLE_DERIVATIVES
ray_state.pos = out_hit.position;
#else
ray_state.pos = out_hit.position.val;
if (params.use_derivatives && ray_state.intersection == 0)
{
#ifdef USE_FAKE_DERIVATIVES
out_hit.position.dx = make_float3(1.0f, 0.0f, 0.0f);
out_hit.position.dy = make_float3(0.0f, 1.0f, 0.0f);
out_hit.texture_coords[0].dx = make_float3(1.0f, 0.0f, 0.0f);
out_hit.texture_coords[0].dy = make_float3(0.0f, 1.0f, 0.0f);
#else
// compute ray differential for one-pixel offset rays
// ("Physically Based Rendering", 3rd edition, chapter 10.1.1)
const float d = dot(out_hit.normal, ray_state.pos);
const float tx = (d - dot(out_hit.normal, ray_state.pos_rx)) / dot(out_hit.normal, ray_state.dir_rx);
const float ty = (d - dot(out_hit.normal, ray_state.pos_ry)) / dot(out_hit.normal, ray_state.dir_ry);
ray_state.pos_rx += ray_state.dir_rx * tx;
ray_state.pos_ry += ray_state.dir_ry * ty;
out_hit.position.dx = ray_state.pos_rx - ray_state.pos;
out_hit.position.dy = ray_state.pos_ry - ray_state.pos;
float4 A;
float2 B_x, B_y;
if (fabsf(out_hit.normal.x) > fabsf(out_hit.normal.y) && fabsf(out_hit.normal.x) > fabsf(out_hit.normal.z))
{
B_x = make_float2(
ray_state.pos_rx.y - ray_state.pos.y,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.y - ray_state.pos.y,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
out_hit.tangent_u.y, out_hit.tangent_u.z, out_hit.tangent_v.y, out_hit.tangent_v.z);
}
else if (fabsf(out_hit.normal.y) > fabsf(out_hit.normal.z))
{
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
out_hit.tangent_u.x, out_hit.tangent_u.z, out_hit.tangent_v.x, out_hit.tangent_v.z);
}
else
{
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.y - ray_state.pos.y);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.y - ray_state.pos.y);
A = make_float4(
out_hit.tangent_u.x, out_hit.tangent_u.y, out_hit.tangent_v.x, out_hit.tangent_v.y);
}
const float det = A.x * A.w - A.y * A.z;
if (fabsf(det) > 1e-10f)
{
const float inv_det = 1.0f / det;
out_hit.texture_coords[0].dx.x = inv_det * (A.w * B_x.x - A.z * B_x.y);
out_hit.texture_coords[0].dx.y = inv_det * (A.x * B_x.y - A.y * B_x.x);
out_hit.texture_coords[0].dy.x = inv_det * (A.w * B_y.x - A.z * B_y.y);
out_hit.texture_coords[0].dy.y = inv_det * (A.x * B_y.y - A.y * B_y.x);
}
#endif
}
#endif
out_hit.tangent_u = normalize(out_hit.tangent_u);
out_hit.tangent_v = normalize(out_hit.tangent_v);
return true;
}
__device__ bool cull_point_light(
const Kernel_params ¶ms,
const float3 &light_position,
const float3 &light_direction /*to light*/,
const float3 &normal)
{
switch (params.geometry)
{
case GT_SPHERE:
{
// same as default, but allow lights inside the sphere
const float inside = (squared_length(light_position) < GT_SPHERE_RADIUS) ? -1.f : 1.f;
return (dot(light_direction, normal) * inside) <= 0.0f;
}
case GT_HAIR:
// ignore light sources within the volume
return (light_position.x * light_position.x +
light_position.z * light_position.z) < GT_SPHERE_RADIUS;
default:
// ignore light from behind
return dot(light_direction, normal) <= 0.0f;
}
}
__device__ bool cull_env_light(
const Kernel_params ¶ms,
const float3 &light_direction /*to light*/,
const float3 &normal)
{
switch (params.geometry)
{
case GT_HAIR:
// allow light from behind
return false;
case GT_SPHERE:
default:
// ignore light from behind
return dot(light_direction, normal) <= 0.0f;
}
}
__device__ void continue_ray(
Ray_state& ray_state,
const Ray_hit_info &hit_indo,
unsigned int event_type,
const Kernel_params ¶ms)
{
switch (params.geometry)
{
case GT_HAIR:
if (event_type == BSDF_EVENT_GLOSSY_TRANSMISSION)
{
// conservative
ray_state.pos += ray_state.dir * 2.0f * GT_HAIR_RADIUS;
ray_state.inside = false;
}
break;
default:
return;
}
}
//-------------------------------------------------------------------------------------------------
// events that are define a transition between states, along with tag IDs
enum Transition_type
{
TRANSITION_CAMERA = 0,
TRANSITION_LIGHT,
TRANSITION_EMISSION,
TRANSITION_SCATTER_DR,
TRANSITION_SCATTER_DT,
TRANSITION_SCATTER_GR,
TRANSITION_SCATTER_GT,
TRANSITION_SCATTER_SR,
TRANSITION_SCATTER_ST,
TRANSITION_COUNT,
};
// go to the next state, given the current state and a transition token.
__device__ inline uint32_t lpe_transition(
uint32_t current_state,
Transition_type event,
uint32_t global_tag_id,
const Kernel_params ¶ms)
{
if(current_state == static_cast<uint32_t>(-1))
return static_cast<uint32_t>(-1);
return params.lpe_state_table[
current_state * params.lpe_num_transitions +
static_cast<uint32_t>(TRANSITION_COUNT) * global_tag_id +
static_cast<uint32_t>(event)];
}
// add direct contribution, e.g., for emission, direct light hits
__device__ inline void accumulate_contribution(
Transition_type light_event,
uint32_t light_global_tag_id,
const float3& contrib,
Ray_state &ray_state,
const Kernel_params ¶ms)
{
// check if there is a valid transition to that light source
uint32_t next_state = lpe_transition(
ray_state.lpe_current_state, light_event, light_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// add contribution the when the reached state is a final state for the selected LPE
// here we only have one LPE buffer, but more can be added easily by checking different LPEs
if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
ray_state.contribution += contrib;
}
// add contribution for next event estimations
__device__ inline void accumulate_next_event_contribution(
Transition_type scatter_event, uint32_t material_global_tag_id,
Transition_type light_event, uint32_t light_global_tag_id,
const float3& contrib,
Ray_state &ray_state,
const Kernel_params ¶ms)
{
// transition following the scatter event
uint32_t next_state = lpe_transition(
ray_state.lpe_current_state, scatter_event, material_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// check if there is a valid transition to the light source
next_state = lpe_transition(
next_state, light_event, light_global_tag_id, params);
if (next_state == static_cast<uint32_t>(-1)) return;
// add contribution the when the reached state is a final state for the selected LPE
// here we only have one LPE buffer, but more can be added easily by checking different LPEs
if ((params.lpe_final_mask[next_state] & (1 << params.lpe_ouput_expression)) != 0)
ray_state.contribution += contrib;
}
// evaluate if certain shading point (pos) is visible to a light direction by casting a shadow ray.
__device__ inline bool trace_shadow(
const float3 &pos,
const float3 &to_light,
uint32_t &seed,
Ray_state &ray_state,
const Kernel_params ¶ms)
{
Ray_state ray_shadow = ray_state;
ray_shadow.pos = pos;
ray_shadow.dir = normalize(to_light);
Ray_hit_info shadow_hit;
bool in_shadow = intersect_geometry(ray_shadow, params, shadow_hit);
if (in_shadow)
{
float4 texture_results[16];
// material of the current object
Df_cuda_material material = params.material_buffer[params.current_material];
Mdl_function_index func_idx;
func_idx = get_mdl_function_index(material.init);
if (!is_valid(func_idx))
return false;
// create state
Mdl_state state = {
shadow_hit.normal,
shadow_hit.normal,
shadow_hit.position,
0.0f,
shadow_hit.texture_coords,
&shadow_hit.tangent_u,
&shadow_hit.tangent_v,
texture_results,
params.tc_data[func_idx.x].ro_data_segment,
identity,
identity,
0,
1.0f
};
// access textures and other resource data
// expect that the target code index is the same for all functions of a material
Mdl_resource_handler mdl_resources;
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
// initialize the state
as_init(func_idx)(&state, &mdl_resources.data, arg_block);
// handle cutouts be treating the opacity as chance to hit the surface
// if we don't hit it, the ray will continue with the same direction
func_idx = get_mdl_function_index(material.cutout_opacity);
if (is_valid(func_idx))
{
float opacity = as_expression_typed<float>(func_idx)(&state, &mdl_resources.data, arg_block);
const float x_anyhit = rnd(seed);
in_shadow = (x_anyhit <= opacity);
}
}
return in_shadow;
}
//-------------------------------------------------------------------------------------------------
__device__ inline bool trace_scene(
uint32_t &seed,
Ray_state &ray_state,
const Kernel_params ¶ms)
{
// stop at invalid states
if (ray_state.lpe_current_state == static_cast<uint32_t>(-1))
return false;
// intersect with geometry
Ray_hit_info hit;
if (!intersect_geometry(ray_state, params, hit)) {
if (ray_state.intersection == 0 && params.mdl_test_type != MDL_TEST_NO_ENV) {
// primary ray miss, add environment contribution
const float2 uv = environment_coords(ray_state.dir, params);
const float4 texval = tex2D<float4>(params.env_tex, uv.x, uv.y);
// add contribution, if `CL` is a valid path
accumulate_contribution(
TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
make_float3(texval.x, texval.y, texval.z) * params.env_intensity,
ray_state, params);
}
return false;
}
float4 texture_results[16];
// material of the current object
Df_cuda_material material = params.material_buffer[params.current_material];
Mdl_function_index func_idx;
func_idx = get_mdl_function_index(material.init);
if (!is_valid(func_idx))
return false;
// create state
Mdl_state state = {
hit.normal,
hit.normal,
hit.position,
0.0f,
hit.texture_coords,
&hit.tangent_u,
&hit.tangent_v,
texture_results,
params.tc_data[func_idx.x].ro_data_segment,
identity,
identity,
0,
1.0f
};
// access textures and other resource data
// expect that the target code index is the same for all functions of a material
Mdl_resource_handler mdl_resources;
mdl_resources.set_target_code_index(params, func_idx); // init resource handler
const char* arg_block = get_arg_block(params, func_idx); // get material parameters
// initialize the state
as_init(func_idx)(&state, &mdl_resources.data, arg_block);
// handle cutouts be treating the opacity as chance to hit the surface
// if we don't hit it, the ray will continue with the same direction
func_idx = get_mdl_function_index(material.cutout_opacity);
if (is_valid(func_idx))
{
float opacity = as_expression_typed<float>(func_idx)(&state, &mdl_resources.data, arg_block);
const float x_anyhit = rnd(seed);
if (x_anyhit > opacity)
{
// decrease to see the environment though front and back face cutouts
ray_state.intersection--;
// change the side
ray_state.inside_cutout = !ray_state.inside_cutout;
// avoid self-intersections
ray_state.pos += hit.normal * (ray_state.inside_cutout ? -0.001f : 0.001f);
return true;
}
}
// for evaluating parts of the BSDF individually, e.g. for implementing LPEs
// the MDL SDK provides several options to pass out the BSDF, EDF, and auxiliary data
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
// application provided memory
// the data structs will get only a pointer to a buffer, along with size and offset
const unsigned df_eval_slots = 4; // number of handles (parts) that can be evaluated
// at once. 4 is an arbitrary choice. However, it
// has to match eval_data.handle_count and
// aux_data.handle_count)
float3 result_buffer_0[df_eval_slots]; // used for bsdf_diffuse, albedo_diffue, and edf
float3 result_buffer_1[df_eval_slots]; // used for bsdf_glossy, albedo_glossy
float3 result_buffer_2[df_eval_slots]; // used for normal
float3 result_buffer_3[df_eval_slots]; // used for roughness
#elif DF_HANDLE_SLOTS == DF_HSM_NONE
// handles are ignored, all parts of the BSDF are returned at once without loops (fastest)
const unsigned df_eval_slots = 1;
#else
// eval_data and auxiliary_data have a fixed size array to pass the data. Only an offset
// is required if there are more handles (parts) than slots.
const unsigned df_eval_slots = DF_HANDLE_SLOTS;
#endif
// apply volume attenuation after first bounce
// (assuming uniform absorption coefficient and ignoring scattering coefficient)
if (ray_state.intersection > 0)
{
func_idx = get_mdl_function_index(material.volume_absorption);
if (is_valid(func_idx))
{
float3 abs_coeff = as_expression_typed<float3>(func_idx)(&state, &mdl_resources.data, arg_block);
ray_state.weight.x *= abs_coeff.x > 0.0f ? expf(-abs_coeff.x * hit.distance) : 1.0f;
ray_state.weight.y *= abs_coeff.y > 0.0f ? expf(-abs_coeff.y * hit.distance) : 1.0f;
ray_state.weight.z *= abs_coeff.z > 0.0f ? expf(-abs_coeff.z * hit.distance) : 1.0f;
}
}
// for thin_walled materials there is no 'inside'
bool thin_walled = false;
Mdl_function_index thin_walled_func_idx = get_mdl_function_index(material.thin_walled);
if (is_valid(thin_walled_func_idx))
thin_walled = as_expression_typed<bool>(thin_walled_func_idx)(
&state, &mdl_resources.data, arg_block);
// add emission
func_idx = get_mdl_function_index((thin_walled && ray_state.inside_cutout) ? material.backface_edf : material.edf);
if (is_valid(func_idx))
{
// evaluate intensity expression
float3 emission_intensity = make_float3(0.0, 0.0, 0.0);
Mdl_function_index intensity_func_idx = get_mdl_function_index(
(thin_walled && ray_state.inside_cutout) ? material.backface_emission_intensity : material.emission_intensity);
if (is_valid(intensity_func_idx))
{
emission_intensity = as_expression_typed<float3>(intensity_func_idx)(
&state, &mdl_resources.data, arg_block);
}
// evaluate EDF
Edf_evaluate_data<(Df_handle_slot_mode) DF_HANDLE_SLOTS> eval_data;
eval_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.edf = result_buffer_0;
eval_data.handle_count = df_eval_slots;
#endif
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
unsigned edf_mtag_to_gtag_map_size = (thin_walled && ray_state.inside_cutout) ?
material.backface_edf_mtag_to_gtag_map_size : material.edf_mtag_to_gtag_map_size;
unsigned* edf_mtag_to_gtag_map = (thin_walled && ray_state.inside_cutout) ?
&material.backface_edf_mtag_to_gtag_map[0] : &material.edf_mtag_to_gtag_map[0];
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < edf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials EDF
as_edf_evaluate(func_idx)(&eval_data, &state, &mdl_resources.data, arg_block);
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < edf_mtag_to_gtag_map_size); ++lobe)
{
// add emission contribution
accumulate_contribution(
TRANSITION_EMISSION, edf_mtag_to_gtag_map[offset + lobe],
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.edf * emission_intensity,
#else
eval_data.edf[lobe] * emission_intensity,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
func_idx = get_mdl_function_index(
(thin_walled && ray_state.inside_cutout) ? material.backface_bsdf : material.bsdf);
unsigned bsdf_mtag_to_gtag_map_size = (thin_walled && ray_state.inside_cutout) ?
material.backface_bsdf_mtag_to_gtag_map_size : material.bsdf_mtag_to_gtag_map_size;
unsigned* bsdf_mtag_to_gtag_map = (thin_walled && ray_state.inside_cutout) ?
&material.backface_bsdf_mtag_to_gtag_map[0] : &material.bsdf_mtag_to_gtag_map[0];
if (is_valid(func_idx))
{
// reuse memory for function data
union
{
Bsdf_sample_data sample_data;
Bsdf_evaluate_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS> eval_data;
Bsdf_pdf_data pdf_data;
Bsdf_auxiliary_data<(Df_handle_slot_mode)DF_HANDLE_SLOTS> aux_data;
};
// initialize shared fields
if (ray_state.inside && !thin_walled)
{
sample_data.ior1.x = BSDF_USE_MATERIAL_IOR;
sample_data.ior2 = make_float3(1.0f, 1.0f, 1.0f);
}
else
{
sample_data.ior1 = make_float3(1.0f, 1.0f, 1.0f);
sample_data.ior2.x = BSDF_USE_MATERIAL_IOR;
}
sample_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
// if requested, fill auxiliary buffers
if (params.enable_auxiliary_output && ray_state.intersection == 0)
{
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
aux_data.albedo_diffuse = result_buffer_0;
aux_data.albedo_glossy = result_buffer_1;
aux_data.normal = result_buffer_2;
aux_data.roughness = result_buffer_3;
aux_data.handle_count = df_eval_slots;
#endif
aux_data.flags = params.bsdf_data_flags;
// outer loop in case there are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
aux_data.handle_offset = offset;
#endif
// evaluate the materials auxiliary
as_bsdf_auxiliary(func_idx)(&aux_data, &state, &mdl_resources.data, arg_block);
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
{
// to keep it simpler, the individual albedo and normals are averaged
// however, the parts can also be used separately, e.g. for LPEs
#if DF_HANDLE_SLOTS == DF_HSM_NONE
ray_state.aux->albedo += aux_data.albedo_diffuse + aux_data.albedo_glossy;
ray_state.aux->normal += aux_data.normal;
#else
ray_state.aux->albedo += aux_data.albedo_diffuse[lobe] +
aux_data.albedo_glossy[lobe];
ray_state.aux->normal += aux_data.normal[lobe];
#endif
ray_state.aux->num++;
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
// compute direct lighting for point light
Transition_type transition_glossy, transition_diffuse;
if (params.light_intensity > 0.0f)
{
float3 to_light = params.light_pos - ray_state.pos;
bool culled_light = cull_point_light(params, params.light_pos, to_light, ray_state.inside_cutout ? -hit.normal : hit.normal);
// cast a shadow ray when inside a cutout
if (ray_state.inside_cutout && !culled_light)
{
const float3 pos = ray_state.pos - hit.normal * 0.001f;
culled_light = trace_shadow(pos, to_light, seed, ray_state, params);
}
if(!culled_light)
{
const float inv_squared_dist = 1.0f / squared_length(to_light);
const float3 f = params.light_color * params.light_intensity *
inv_squared_dist * (float) (0.25 / M_PI);
eval_data.k2 = to_light * sqrtf(inv_squared_dist);
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.bsdf_diffuse = result_buffer_0;
eval_data.bsdf_glossy = result_buffer_1;
eval_data.handle_count = df_eval_slots;
#endif
eval_data.flags = params.bsdf_data_flags;
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials BSDF
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, arg_block);
// we know if we reflect or transmit
if (dot(to_light, ray_state.inside_cutout ? -hit.normal : hit.normal) > 0.0f) {
transition_glossy = TRANSITION_SCATTER_GR;
transition_diffuse = TRANSITION_SCATTER_DR;
} else {
transition_glossy = TRANSITION_SCATTER_GT;
transition_diffuse = TRANSITION_SCATTER_DT;
}
// sample weight
const float3 w = ray_state.weight * f;
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
{
// get the `global tag` of the lobe
unsigned material_lobe_gtag = bsdf_mtag_to_gtag_map[offset + lobe];
// add diffuse contribution
accumulate_next_event_contribution(
transition_diffuse, material_lobe_gtag,
TRANSITION_LIGHT, params.point_light_gtag, // light group
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_diffuse * w,
#else
eval_data.bsdf_diffuse[lobe] * w,
#endif
ray_state, params);
// add glossy contribution
accumulate_next_event_contribution(
transition_glossy, material_lobe_gtag,
TRANSITION_LIGHT, params.point_light_gtag, // light group
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_glossy * w,
#else
eval_data.bsdf_glossy[lobe] * w,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
}
// importance sample environment light
if (params.mdl_test_type != MDL_TEST_SAMPLE && params.mdl_test_type != MDL_TEST_NO_ENV)
{
const float xi0 = rnd(seed);
const float xi1 = rnd(seed);
const float xi2 = rnd(seed);
float3 light_dir;
float pdf;
const float3 f = environment_sample(light_dir, pdf, make_float3(xi0, xi1, xi2), params);
bool culled_env_light = pdf < 0.0f || cull_env_light(params, light_dir, ray_state.inside_cutout ? -hit.normal : hit.normal);
// cast a shadow ray when inside a cutout
if(ray_state.inside_cutout && !culled_env_light)
{
const float3 pos = ray_state.pos - hit.normal * 0.001f;
culled_env_light = trace_shadow(pos, light_dir, seed, ray_state, params);
}
if (!culled_env_light)
{
eval_data.k2 = light_dir;
#if DF_HANDLE_SLOTS == DF_HSM_POINTER
eval_data.bsdf_diffuse = result_buffer_0;
eval_data.bsdf_glossy = result_buffer_1;
eval_data.handle_count = df_eval_slots;
#endif
eval_data.flags = params.bsdf_data_flags;
// outer loop in case the are more material tags than slots in the evaluate struct
unsigned offset = 0;
#if DF_HANDLE_SLOTS != DF_HSM_NONE
for (; offset < bsdf_mtag_to_gtag_map_size; offset += df_eval_slots)
{
eval_data.handle_offset = offset;
#endif
// evaluate the materials BSDF
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, arg_block);
const float mis_weight =
(params.mdl_test_type == MDL_TEST_EVAL) ? 1.0f : pdf / (pdf + eval_data.pdf);
// we know if we reflect or transmit
if (dot(light_dir, ray_state.inside_cutout ? -hit.normal : hit.normal) > 0.0f) {
transition_glossy = TRANSITION_SCATTER_GR;
transition_diffuse = TRANSITION_SCATTER_DR;
} else {
transition_glossy = TRANSITION_SCATTER_GT;
transition_diffuse = TRANSITION_SCATTER_DT;
}
// sample weight
const float3 w = ray_state.weight * f * mis_weight;
// iterate over all lobes (tags that appear in the df)
for (unsigned lobe = 0; (lobe < df_eval_slots) &&
((offset + lobe) < bsdf_mtag_to_gtag_map_size); ++lobe)
{
// get the `global tag` of the lobe
unsigned material_lobe_gtag = bsdf_mtag_to_gtag_map[offset + lobe];
// add diffuse contribution
accumulate_next_event_contribution(
transition_diffuse, material_lobe_gtag,
TRANSITION_LIGHT, params.env_gtag, // light group 'env'
#if DF_HANDLE_SLOTS == DF_HSM_NONE
(eval_data.bsdf - eval_data.bsdf_glossy) * w,
#else
eval_data.bsdf_diffuse[lobe] * w,
#endif
ray_state, params);
// add glossy contribution
accumulate_next_event_contribution(
transition_glossy, material_lobe_gtag,
TRANSITION_LIGHT, params.env_gtag, // light group 'env'
#if DF_HANDLE_SLOTS == DF_HSM_NONE
eval_data.bsdf_glossy * w,
#else
eval_data.bsdf_glossy[lobe] * w,
#endif
ray_state, params);
}
#if DF_HANDLE_SLOTS != DF_HSM_NONE
}
#endif
}
}
// importance sample BSDF
{
sample_data.xi.x = rnd(seed);
sample_data.xi.y = rnd(seed);
sample_data.xi.z = rnd(seed);
sample_data.xi.w = rnd(seed);
sample_data.flags = params.bsdf_data_flags;
// sample the materials BSDF
as_bsdf_sample(func_idx)(&sample_data, &state, &mdl_resources.data, arg_block);
if (sample_data.event_type == BSDF_EVENT_ABSORB)
return false;
ray_state.dir = sample_data.k2;
ray_state.weight *= sample_data.bsdf_over_pdf;
const bool transmission = (sample_data.event_type & BSDF_EVENT_TRANSMISSION) != 0;
if (transmission)
ray_state.inside = !ray_state.inside;
const bool is_specular = (sample_data.event_type & BSDF_EVENT_SPECULAR) != 0;
Transition_type next;
if (is_specular)
next = transmission ? TRANSITION_SCATTER_ST : TRANSITION_SCATTER_SR;
else if ((sample_data.event_type & BSDF_EVENT_DIFFUSE) != 0)
next = transmission ? TRANSITION_SCATTER_DT : TRANSITION_SCATTER_DR;
else
next = transmission ? TRANSITION_SCATTER_GT : TRANSITION_SCATTER_GR;
// move ray to the next sampled state
ray_state.lpe_current_state = lpe_transition(
ray_state.lpe_current_state, next,
#if DF_HANDLE_SLOTS == DF_HSM_NONE
// ill-defined case: the LPE machine expects tags but the renderer ignores them
// -> the resulting image of LPEs with tags is undefined in this case
params.default_gtag,
#else
bsdf_mtag_to_gtag_map[sample_data.handle], // sampled lobe
#endif
params);
// depending on the geometry, the ray might be displaced before continuing
continue_ray(ray_state, hit, sample_data.event_type, params);
if (ray_state.inside || ray_state.inside_cutout)
{
// avoid self-intersections
ray_state.pos -= hit.normal * 0.001f;
return true; // continue bouncing in sphere
}
else if (params.mdl_test_type != MDL_TEST_NO_ENV &&
params.mdl_test_type != MDL_TEST_EVAL)
{
// leaving sphere, add contribution from environment hit
float pdf;
const float3 f = environment_eval(pdf, sample_data.k2, params);
float bsdf_pdf;
if (params.mdl_test_type == MDL_TEST_MIS_PDF)
{
const float3 k2 = sample_data.k2;
pdf_data.k2 = k2;
pdf_data.flags = params.bsdf_data_flags;
// get pdf corresponding to the materials BSDF
as_bsdf_pdf(func_idx)(&pdf_data, &state, &mdl_resources.data, arg_block);
bsdf_pdf = pdf_data.pdf;
}
else
bsdf_pdf = sample_data.pdf;
if (is_specular || bsdf_pdf > 0.0f)
{
const float mis_weight = is_specular ||
(params.mdl_test_type == MDL_TEST_SAMPLE) ? 1.0f :
bsdf_pdf / (pdf + bsdf_pdf);
float3 specular_contrib = ray_state.weight * f * mis_weight;
accumulate_contribution(
TRANSITION_LIGHT, params.env_gtag /* light group 'env' */,
specular_contrib,
ray_state, params);
}
}
}
}
return false;
}
struct render_result
{
float3 beauty;
auxiliary_data aux;
};
__device__ inline render_result render_scene(
uint32_t &seed,
const Kernel_params ¶ms,
const unsigned x,
const unsigned y)
{
const float inv_res_x = 1.0f / (float)params.resolution.x;
const float inv_res_y = 1.0f / (float)params.resolution.y;
const float dx = params.disable_aa ? 0.5f : rnd(seed);
const float dy = params.disable_aa ? 0.5f : rnd(seed);
const float2 screen_pos = make_float2(
((float)x + dx) * inv_res_x,
((float)y + dy) * inv_res_y);
const float r = (2.0f * screen_pos.x - 1.0f);
const float r_rx = (2.0f * (screen_pos.x + inv_res_x) - 1.0f);
const float u = (2.0f * screen_pos.y - 1.0f);
const float u_ry = (2.0f * (screen_pos.y + inv_res_y) - 1.0f);
const float aspect = (float)params.resolution.y / (float)params.resolution.x;
render_result res;
clear(res.aux);
Ray_state ray_state;
ray_state.contribution = make_float3(0.0f, 0.0f, 0.0f);
ray_state.weight = make_float3(1.0f, 1.0f, 1.0f);
ray_state.pos = ray_state.pos_rx = ray_state.pos_ry = params.cam_pos;
ray_state.dir = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u);
ray_state.dir_rx = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r_rx + params.cam_up * aspect * u);
ray_state.dir_ry = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u_ry);
ray_state.inside = false;
ray_state.inside_cutout = false;
ray_state.lpe_current_state = 1; // already at the camera so state 0 to 1 is free as long as
// there is only one camera
ray_state.aux = &res.aux;
const unsigned int max_inters = params.max_path_length - 1;
for (ray_state.intersection = 0; ray_state.intersection < max_inters; ++ray_state.intersection)
{
if (!trace_scene(seed, ray_state, params))
break;
}
res.beauty =
isfinite(ray_state.contribution.x) &&
isfinite(ray_state.contribution.y) &&
isfinite(ray_state.contribution.z) ? ray_state.contribution : make_float3(0.0f, 0.0f, 0.0f);
normalize(res.aux);
return res;
}
// quantize + gamma
__device__ inline unsigned int float3_to_rgba8(float3 val)
{
const unsigned int r = (unsigned int) (255.0f * powf(__saturatef(val.x), 1.0f / 2.2f));
const unsigned int g = (unsigned int) (255.0f * powf(__saturatef(val.y), 1.0f / 2.2f));
const unsigned int b = (unsigned int) (255.0f * powf(__saturatef(val.z), 1.0f / 2.2f));
return 0xff000000 | (b << 16) | (g << 8) | r;
}
// exposure + simple Reinhard tonemapper + gamma
__device__ inline unsigned int display(float3 val, const float tonemap_scale)
{
val *= tonemap_scale;
const float burn_out = 0.1f;
val.x *= (1.0f + val.x * burn_out) / (1.0f + val.x);
val.y *= (1.0f + val.y * burn_out) / (1.0f + val.y);
val.z *= (1.0f + val.z * burn_out) / (1.0f + val.z);
return float3_to_rgba8(val);
}
// CUDA kernel rendering simple geometry with IBL
extern "C" __global__ void render_scene_kernel(
const Kernel_params kernel_params)
{
const unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= kernel_params.resolution.x || y >= kernel_params.resolution.y)
return;
const unsigned int idx = y * kernel_params.resolution.x + x;
uint32_t seed = tea<4>(idx, kernel_params.iteration_start);
render_result res;
float3 beauty = make_float3(0.0f, 0.0f, 0.0f);
auxiliary_data aux;
clear(aux);
for (unsigned int s = 0; s < kernel_params.iteration_num; ++s)
{
res = render_scene(
seed,
kernel_params,
x, y);
beauty += res.beauty;
aux += res.aux;
}
beauty *= 1.0f / (float)kernel_params.iteration_num;
normalize(aux);
// accumulate
if (kernel_params.iteration_start == 0) {
kernel_params.accum_buffer[idx] = beauty;
if (kernel_params.enable_auxiliary_output) {
kernel_params.albedo_buffer[idx] = aux.albedo;
kernel_params.normal_buffer[idx] = aux.normal;
}
} else {
float iteration_weight = (float) kernel_params.iteration_num /
(float) (kernel_params.iteration_start + kernel_params.iteration_num);
float3 buffer_val = kernel_params.accum_buffer[idx] +
(beauty - kernel_params.accum_buffer[idx]) * iteration_weight;
kernel_params.accum_buffer[idx] =
(isinf(buffer_val.x) || isnan(buffer_val.y) || isinf(buffer_val.z) ||
isnan(buffer_val.x) || isinf(buffer_val.y) || isnan(buffer_val.z))
? make_float3(0.0f, 0.0f, 1.0e+30f)
: buffer_val;
if (kernel_params.enable_auxiliary_output) {
// albedo
kernel_params.albedo_buffer[idx] = kernel_params.albedo_buffer[idx] +
(aux.albedo - kernel_params.albedo_buffer[idx]) * iteration_weight;
// normal, check for zero length first
float3 weighted_normal = kernel_params.normal_buffer[idx] +
(aux.normal - kernel_params.normal_buffer[idx]) * iteration_weight;
if (dot(weighted_normal, weighted_normal) > 0.0f)
weighted_normal = normalize(weighted_normal);
kernel_params.normal_buffer[idx] = weighted_normal;
}
}
// update display buffer
if (kernel_params.display_buffer)
{
switch (kernel_params.enable_auxiliary_output ? kernel_params.display_buffer_index : 0)
{
case 1: /* albedo */
kernel_params.display_buffer[idx] = float3_to_rgba8(kernel_params.albedo_buffer[idx]);
break;
case 2: /* normal */
{
float3 display_normal = kernel_params.normal_buffer[idx];
if (dot(display_normal, display_normal) > 0) {
display_normal.x = display_normal.x * 0.5f + 0.5f;
display_normal.y = display_normal.y * 0.5f + 0.5f;
display_normal.z = display_normal.z * 0.5f + 0.5f;
}
kernel_params.display_buffer[idx] = float3_to_rgba8(display_normal);
break;
}
default: /* beauty */
kernel_params.display_buffer[idx] =
display(kernel_params.accum_buffer[idx], kernel_params.exposure_scale);
break;
}
}
}