Product Importance Sampling for Light Transport Path Guiding

The efficiency of Monte Carlo algorithms for light transport simulation is directly related to their ability to importance‐sample the product of the illumination and reflectance in the rendering equation. Since the optimal sampling strategy would require knowledge about the transport solution itself, importance sampling most often follows only one of the known factors – BRDF or an approximation of the incident illumination. To address this issue, we propose to represent the illumination and the reflectance factors by the Gaussian mixture model (GMM), which we fit by using a combination of weighted expectation maximization and non‐linear optimization methods. The GMM representation then allows us to obtain the resulting product distribution for importance sampling on‐the‐fly at each scene point. For its efficient evaluation and sampling we preform an up‐front adaptive decimation of both factor mixtures. In comparison to state‐of‐the‐art sampling methods, we show that our product importance sampling can lead to significantly better convergence in scenes with complex illumination and reflectance.


Introduction
Importance sampling is an essential component of efficient Monte Carlo light transport simulations. Estimating the illumination at a scene location involves sampling the integrand of the rendering equation [Kaj86], defined by the product of the incident radiance and the cosine-weighted reflectance function (BRDF). However, since the incident radiance is unknown upfront-as it corresponds to the transport solution itself-the traditional approach to importance sampling is to distribute samples proportionally to the BRDF [PH10].
On the other hand, multiple approaches- [Jen95,HP02], or more recently [VKŠ * 14]-have shown that sampling according to even a rough estimate of the directional distribution of incident radiance can result in significant performance increases, especially in scenes with strongly varying illumination. However there is still large potential for improvement, since they do not sample proportionally to the full integrand, i. e., the product of radiance and BRDF.
In this paper we present a product importance sampling technique for estimating indirect illumination (Sec. 4). Our main goal is to find a good approximation to the illumination integrand as a whole, and use this approximation as a sampling distribution during rendering. With an accurate approximation, this approach ensures virtually optimal estimator convergence, while keeping the solution unbiased.
We propose to represent the distribution factors-BRDF and incident radiance-by two separately obtained Gaussian mixtures (see Fig. 2). For obtaining the BRDF factor, we design a pre-processing optimization step where all applicable BRDFs in the rendered scene are fitted for a densely sampled set of incident directions (Sec. 4.1). To estimate the incident radiance factor, we adapt the on-line learning framework of Vorba et al. [VKŠ * 14] (Sec. 4). This in turn enables analytically calculating the product distribution as another Gaussian mixture that can be directly used for sampling. Given that the resulting product mixture has a quadratic number of components with respect to the input mixtures, we propose using an adaptive  ) multiple-importance-sample only the BRDF and/or incident radiance factors, our method samples proportionally to the full product (right). Therefore, unlike previous work, we cannot miss features of the integrand -provided that the factors are fitted accurately.
decimation strategy to separately reduce the number of BRDF and radiance components (Sec. 4.2).
When used in rendering, the resulting method provides significant efficiency and sampling improvements, especially for scenes with strong glossy transport or difficult illumination (Fig. 1, Sec. 6). We also demonstrate that a slow, generic non-linear optimization, traditionally used to fit Gaussian kernels to BRDFs, can be sidestepped by a substantially more efficient expectation-maximization approach [DLR77,Bis06].

Related Work
This section gives an overview of related work on importance sampling in Monte Carlo (MC) light transport and the use of Gaussians in rendering.
Monte Carlo light transport. MC methods for light transport aim to solve the rendering equation [Kaj86] (Eq. 1) at a given point in the scene. They approach this problem by stochastically sampling and evaluating the reflected incident illumination, tracing paths from the sensor [Kaj86] or light source [Arv86]. Unlike our cachingbased approach, these MC methods do not take advantage of any precomputed illumination approximation to improve sampling.
Markov-chain MC approaches [VG97,Vea98,KSKAC02] use Metropolis sampling to explore the neighborhood of highthroughput paths, once found. They attempt to achieve globally optimal path sampling by mutating entire paths, but face the difficult problem of balancing global exploration and local exploitation of the path space, which can lead to uneven convergence behavior. In contrast, our approach uses appropriate-as close to optimal as possible-local sampling decisions at each intersection point, and does not suffer from this issue.
BRDF and illumination importance sampling. Paths with high contribution can be found by importance-sampling the product of surface BRDF and incoming illumination. The BRDF can be either represented by an analytic model or by measured data. The advantage of the former representation is that for most models an analytic way for importance sampling exists [WMLT07,HD]. Since the size and dimensionality of measured BRDFs can be impractical, several methods have been presented to either factorize the data [LRR04] or fit them to analytical models to enable direct importance sampling. Given that our method creates Gaussian mixture fits for the BRDFs, it has the potential to represent both analytic and measured models (albeit our results use only analytic ones).
There is a significant body of works on importance-sampling of the incoming illumination [Jen95,LW95,HP02,PH10]. We follow the direction of Vorba et al. [VKŠ * 14], which shows a great potential to improve the convergence of different MC-based integrators by using Gaussian mixtures to represent incoming illumination. Still, the methods in this category are limited by the fact that they only sample the reflectance and the illumination separately. Multiple importance sampling (MIS) [Vea98] can be used to weigh the contributions from both estimates based on their probability [VKŠ * 14], but this still does not correspond to sampling the product directly, as is the case in our method.
Product importance sampling for direct illumination. Several works have addressed product importance sampling for direct illumination calculation. Unlike our setting, in this case the illumination is known upfront since it is represented either by an environment map or by a cloud of point lights [WÅ]. Some works use a projection of the illumination and reflectance into a functional basis such as wavelets [CJAMJ05,CAM] or spherical harmonics [JCJ09]; hierarchical sample warping is then used to sample from the product. Importance re-sampling methods [TCE05,BGH05,WÅ] are based on the assumption that generating direct illumination and BRDF samples is cheap relatively to visibility sampling. These algorithms thus propose N samples from both domains, and use the product information from them to draw M N final samples. All these methods are limited to sampling the product distribution only for direct illumination, while our presented method targets sampling from the product distribution of indirect illumination and BRDF.
Gaussian representations in rendering. Since the product of Gaussians can be integrated in closed form, Gaussian representations of individual factors of the rendering equation are useful to directly approximate the illumination integral. In real-time rendering, representations based on spherical Gaussians [WRG * 09, XSD * 13] can be used to approximate the direct illumination of microfacet-based materials due to distant environment maps, or point lights. The Gaussian approximation for the reflectance of several microfacet BRDF models can be derived in a closed form. Wang et al. [XCM * 14] additionally provide support for local inter-reflections. In the context of volume rendering, Gaussian mixtures were used by Jakob et al. [JRJ11] to efficiently store and evaluate the (spatial) photon density inside a medium.
However, as the above methods use approximations for estimating the illumination directly, they produce biased results. In contrast, we only utilize the Gaussian approximation to guide importance sampling, and therefore do not suffer from any bias.

Background
This section briefly covers the notation and several theoretical concepts we build on: the rendering equation and its MC sampling, Gaussian mixture models, and their fitting.
Rendering equation (RE). The solution to global illumination on surfaces (i. e., in the absence of participating media) is described by the rendering equation [Kaj86]: where L E is surface emittance and ρ is the bi-directional scattering distribution function (BRDF) on the upper-hemispherical domain Ω. An MC estimator stochastically samples the RE. For instance, path tracing [Kaj86] incrementally builds paths from the cameraat each interaction point x it estimates L R by importance-sampling a single direction ω i according to a hemispherical probability density function (PDF) p : Ω → R, and continues recursively: The variance of such a stochastic estimator corresponds to the ability of p to approximate the integrand; the ideal case of p matching the integrand would result in an estimator with minimal variance possible (see Apx. C for how we determine this). The primary contribution of our method-finding p as close to optimal as possible-is described in Sec. 4.
Gaussian mixture model (GMM). A d-dimensional GMM is defined as a convex combination of K d-dimensional Gaussian components [Bis06]: where the means µ k , symmetric positive-definite covariance matrices Σ k , and mixing weights π k are aggregated in the parameter vector Θ. For G to remain a valid PDF the mixing weights must be non-negative and satisfy ∑ K k π k = 1. We opt for the most general anisotropic variant of the components N in order to represent anisotropic distributions frequently encountered in both BRDFs and radiance distributions.
Given a good initial estimate, the GMM can be efficiently fitted to discrete data by the expectation maximization (EM) algorithm [DLR77]. This yields a GMM-represented density estimate on the data. By considering them as sample multiplicity, particle weights can be incorporated into the estimate (cf. [VKŠ * 14] and references therein). In general, many variants of EM exist, although virtually all of them suffer from getting stuck in local maxima [Bis06]. Generic non-linear optimization methods can sometimes yield better fits, but at the cost of significantly increased computational effort.

Product Importance Sampling
On the high level, our approach strives to obtain a sampling PDF p for choosing a scattering direction that is as close as possible to the integrand of the rendering equation (Eq. 1). To achieve this (also see Fig. 2, bottom), we first compute approximations to the product factors defined as and then compute the resulting PDF as their product: The remainder of this section describes in detail how we obtain the factor mixtures and finally evaluate the product. Please note that for sake of a better intuition about the sampling decisions, from this point on we use a swapped notation for the directions ω i and ω o , in comparison to Eqs. 1 and 2. GMM representation. As already sketched above, we represent the approximate sampling distributions from Eqs. 4 and 5 analytically using bi-variate Gaussian mixtures (Eq. 3). This enables us to take full advantage of the GM model: readily available fitting methods, closed-form sampling, trivial normalization, the ability to represent anisotropic distributions, and simple alignment of mixtures with different reference frames, which then enables a direct calculation of the product mixture (Sec. 4.2). Due to these reasons we opted for this representation over other possible models, such as anisotropic spherical Gaussians or von-Mises-Fisher distribution.
However, p ρ and p L are hemispherical distributions. To represent them in terms of the GMM, the hemispherical domain is mapped to a 2D Euclidean disk space using a low-distortion, area-preserving projective transformation (Fig. 3): All computational steps of the algorithm-fitting, product calculation, and sampling-are then preformed in this domain.
Compared to the low-distortion map by Shirley and Chiu [SC97] used, e. g., by Vorba et al. [VKŠ * 14], the disk mapping has the advantage of not distorting the mapping under azimuthal rotations (Fig. 3). We use this fact for the alignment of the BRDF and illumination mixtures during the product calculation, since they generally occupy two different reference frames. This can be performed by an azimuthal rotation around the local vertical axis, expressed by a 2 × 2 matrix R: the parameters µ k and Σ k for each Gaussian component of one of the mixtures are transformed to µ k = Rµ k and Σ k = RΣ k R T . Algorithmic overview. The pipeline of our method consists of three distinct steps: 1. Pre-processing (Sec. 4.1): for each material in the scene and a discrete set of incident directions ω i , a GMM fit of the corresponding BRDF lobe is computed and stored in a BRDF cache. This results in a database of BRDF factor mixtures p ρ (Eq. 4). 2. Training: over a number of passes, batches of bi-directional particles (i. e., photons and importons) are traced through the scene, being guided by the trained and cached distributions of all preceding passes. A dense illumination cache is built adaptively on-the-fly -each cache record stores a GMM representation of the incident radiance factor p L (Eq. 5) valid for a small neighborhood. The representation is fitted by the EM algorithm -upon creation, each mixture is initialized from a KNN-estimate of local particle density, and then made more accurate by including traced particles from all subsequent training passes. This step follows the work of Vorba et al.
[VKŠ * 14]; please refer to the original paper for a detailed exposition. 3. Rendering (Sec. 4.2): paths from the sensor are guided through the scene by importance-sampling the product mixture p ⊗ (Eq. 6), which is computed on-the-fly for every sampling decision from p ρ and p L by fetching the corresponding GMM caches.

BRDF Fitting
To be able to calculate the product between the illumination given in the GMM form, and the BRDF, a GMM representation of the latter is needed. This section describes how to obtain the BRDF GMM representation for all possible incident directions ω i .
To our current knowledge no generic solution exists to represent an arbitrary BRDF by a mixture of Gaussians. For glossy BRDFs, such as Phong or microfacet models, Wang et al. [WRG * 09] and Xu et al. [XSD * 13] present a closed-form solution to represent normal distribution functions (NDFs) by a mixture of (anisotropic) spherical Gaussians. This NDF representation is then warped into the corresponding BRDF representation by rotating the mean and adjusting the covariance of the mixture components. However, the Fresnel and shadowing terms are considered to be constant over each mixture component and are only evaluated at their means. The warping of the Gaussian components also does not bound the spherical Gaussians to the upper hemisphere, allowing samples colliding with the surface to be created for certain configurations.
We strive to obtain as accurate and efficient product approximation as possible in this context. Consequently, the BRDF Gaussian mixture needs to encapsulate every single component defining it (Fresnel term, shadowing term, NDF, and hemispheric bound). We therefore trade the ability to derive the GMM representation in a closed form against a pre-processing step, which caches accurately fitted GMM representations of all BRDFs contained in the scene.
The following two sub-sections describe two different methods of fitting arbitrary BRDFs to the GMM. Since a BRDF can be a composite of a diffuse and a glossy part, we fit both parts separately and combine them after the fitting. A comparison of the quality of these fits is then presented in Secs. 4.1.2 and 6. The structure of the employed BRDF caches is explained in more detail in Sec. 4.1.3.

Weighted Expectation Maximiztion
Expectation maximization (EM) is used in the context of machine learning and classification to fit the GMM to distributions of observed data points. The EM algorithm maximizes the log-likelihood of the GMM parameter vector Θ given the observed dataset. Since EM fits to the density of the observed data instead of their associated values, Vorba et al. [VKŠ * 14] extended it by adding weights to each observed sample representing the the observed function value.
To fit a GMM with K components to the glossy component of a BRDF a set of N samples is drawn from the BRDF using its standard sampling method. Weighted EM is then used to include the contribution of the samples weighted by ρ(ω)/ p(ω). This weighting also causes invalid samples not to influence the EM fit of the GMM (such as samples from the lower hemisphere).
Since the maximization of the log-likelihood of a GMM is a nonconvex optimization problem, a good initialization of the GMM is essential for preventing the EM from converging to a sub-optimal local maximum. Instead of initializing the mixture parameters randomly, we found that using a quasi Monte-Carlo sampler to draw K samples using the BRDF PDF for the means, setting the component weights to π k = 1 K , and the covariance matrices to diagonal with Σ ii = 0.0125 typically leads to near-optimal EM fits.

Non-linear Optimization
While the weighted EM algorithm is influenced by the BRDF values at the sampled data points, it is still only fitting the (weighted) PDF distribution of the BSDF, which does not correspond to the general curve-fitting problem we face. If a more accurate representation is needed and longer pre-processing time (e. g., one-to-two orders of magnitude) is available, a more sophisticated method such as nonlinear optimization based on QR factorization-as implemented in the Ceres [AMO16] framework-can be used.
In the case of an ideal importance sampling the ratio between the BRDF ρ(ω) and its PDF p(ω) would be constant for every sample ω and the variance of these ratios would vanish (cf. Apx. C). Consequently, to fit as-optimal-as-possible GMM for importance sampling we choose the following objective function: The parameters ω j are the N directional samples drawn from the BRDF and y j are the corresponding 2D positions in the GMM space (Eq. 7). The BRDF values for each sample are normalized (ρ(ω j ) = p(ω j ) ) to represent a valid sampling PDF. Our fitted GMM matches this optimal PDF when the ratio between the normalized BRDF and the GMM is 1 at every sampling point.
To be able to robustly fit the GMM parameters Θ a reparametrization needs to be done to ensure that the bounding conditions for a valid Gaussian mixture hold. The following constraints have to be considered: π k ≥ 0, Σ xx > 0, Σ yy > 0 and |Σ| > 0, where the parameters Σ xx and Σ yy are the diagonal entries of the 2D covariance matrix. To enforce these, we perform the following reparametrization of the parameters π k , µ k and Σ k of each Gaussian To prevent the optimization from generating a Gaussian with a zero covariance matrix a regularization term of = 10 −4 is added to the diagonal of Σ. Similarly as for the weighted EM, the nonlinear optimization also needs a good initialization to avoid the convergence to non-optimal local minima. Therefore, even if the weighted EM does not converge to a perfect fit, it serves as a good initialization for the non-linear optimization and we use it as such.   4 shows a sample result of the two presented fitting methods. The non-linear optimization can achieve a tighter fit to the actual BRDF than the weighted EM, which-especially near grazing angles-can cause the fitted lobe to be skewed towards the hemispherical boundary. It is also visible that this effect relates to the roughness of the BRDF. At lower roughness values the distortion of the BRDF towards the boundary is not so significant and weighted EM can still produce suitable fits. The quality of these fitting methods in terms of their ability to efficiently importance-sample the BRDF is further analyzed in Sec. 6.1.

Caching
The caches for the GMM representations of the BRDF lobes can be categorized into three different types: diffuse, isotropic and anisotropic. The diffuse BRDF lobe is invariant w. r. t. incoming  1.3). We compare BRDF sampling using our cached GM mixtures without and with the correction against the standard BRDF PDFs, in a basic path tracer (i. e., without product sampling and next event estimation) using 256 samples per pixel. Note that we purposely exaggerate the issue here by using a BRDF cache with only 16 records.
directions, so only a single Gaussian mixture needs to be cached for each diffuse BRDF. Isotropic BRDFs are invariant to azimuthal rotations. To efficiently store these, only N GM mixtures for different elevation angles in [0 • , 90 • ] need to be stored. For an arbitrary incident direction ω i the stored mixture with the closest elevation angle is used and rotated to the right azimuth using the formula from the beginning of Sec. 4. Finally, anisotropic BRDFs are not invariant to azimuthal rotations and therefore a cache entry for each hemispherical incident direction needs to be stored. To efficiently distribute and access the nearest cache positions, we use the spherical Fibonacci mapping [KISS15] to index M different cache positions on the upper hemisphere.
Because of the discrete nature of the BRDF cache, a record fetched for a query direction ω i will generally not match it exactly. Given the inability to effectively interpolate and extrapolate GMM caches, we approximately correct this orientation mismatch by translating the mixture by a small offset in the 2D disk space. Note that a similar problem occurs when there is a mismatch between the principal axes of the illumination and BRDF caches. The translation offset corresponds to the difference between the projected reference and query directions. The effectiveness of this solution is shown in Fig. 5 for the case of BRDF cache correction.

Product Calculation and Sampling
Our method bases its sampling decisions on the product distribution, as sketched at the beginning of Sec. 4.
Since the BRDF mixture (and hence the product mixture as well) depends on the outgoing direction, it is necessary to calculate the product distribution every time a sampling decision is made. For a given incident direction ω i and scene location x, this procedure consists of the following steps (see Fig. 2, bottom, for an illustration): 1. Query the BRDF cache (Sec. 4.1) for the mixture p ρ corresponding to ω i and the material at x. Figure 6: Demonstration of the reduction algorithm for synthetic data generated from a random 8-component Gaussian mixture (a). We first perform an EM fit to the data (b) without any knowledge about the generating mixture, and then proceed to decimate it down to a single component (c-i). The cumulative reduction cost Eq. 11 is plotted in (j) with respect to the number of removed mixture components. In this case the reduction would stop at six components for our standard cost threshold C = 0.2, resulting in a mixture with minimal differences from the starting one.
2. Query the illumination cache (Sec. 4, overview) for the mixture p L at x. 3. Compute the resulting product mixture p ⊗ (Eq. 6). 4. Importance-sample p ⊗ to obtain an outgoing direction ω o to continue the path, and then evaluate p ⊗ (ω o ) to normalize the obtained path contribution (Eq. 2).
GMM product. Crucially to our approach, a product of two Gaussian mixtures is again a Gaussian mixture [Bro03]: where we contract G(Θ 1 ) to G 1 and so on. The product mixture parameters Θ 1,2 can be obtained analytically (we list the formulas in Apx. A).
However, the size of the resulting mixture is quadratic in respect to the factors, i. e., K 1,2 = K 1 K 2 . While sampling a Gaussian mixture is relatively cheap (based on the weights π k only a single component is selected and sampled), the computation and evaluation of the product mixture each require O(K 1,2 ) evaluations of the bi-variate Gaussian distribution, corresponding to a relatively costly exponential function. The quadratic complexity of this core operation implies that it is more beneficial to decimate the number of components in the factor mixtures upfront, since each removed component decreases the product calculation costs linearly.
Mixture reduction. Instead of decimating the factor mixtures each time a product mixture is evaluated, we perform this operation only once for each factor mixture once its fitting has been finalized. For the BRDF mixtures this is the end of the pre-processing stage (Sec. 4.1), while for the light mixtures this happens after the training stage (Sec. 4). Note that this means the training stage uses the full light mixtures, which is preferred since inaccuracies occurring during the training would negatively impact the entire rendering stage.
The naïve way to perfrom mixture decimation is based solely on the weights π k : keep components with weights above a certain small quantile (i. e., q = 0.1) and trim the remaining ones. Unfortunately, this approach can unpredictably remove isolated components that might actually correspond to salient features in the solution. A preferred decimation method should therefore be based on merging similar components. Such a technique might still decrease the quality of the product sampling, but if done gracefully, will compensate for it with a higher resulting sampling rate.
For this purpose, we have adapted the GMM reduction algorithm by Runnalls [Run07]. This is a greedy algorithm that proceeds as follows: 1. Begin with a full mixture of K components. 2. For each pair of components k i , k j ∈ 1..K, compute the cost c i, j (Eq. 10) incurred by merging them. easily computed and in practice tends to work well [CWPS11]: where Σ i, j is the covariance of the merged component (see Apx. B for the merging formulas). An illustrative example of this algorithm is shown in Fig. 6.
Reduction quality control. A shortcoming of Runnalls' algorithm is the inability to control the reduced mixture quality, since its termination is conditioned by the number of desired components. Instead, we wish for the reduction to be adaptive, i. e., to decimate mixtures in proportion to their component redundancy (especially because the EM algorithm frequently creates fits with such components). For this, we propose to calculate a cumulative reduction cost C K defined as with c s i, j being the KL cost of merging the components i and j in step s. In other words, Eq. 11 accumulates the reduction cost during the normal execution of the algorithm. We then stop the decimation as soon as C K would exceed a user-defined cost threshold C in the next merging step (cf. Fig. 6). While such threshold is only empirical, we have experimentally found the value of C = 0.2 to be robust in all the presented scenes (Sec. 6).
Ideally, we would guide the reduction procedure by our target metric -the normalized estimator variance (Apx. C). However, in our case the estimator variance is a ratio distribution of two Gaussian mixtures, which has undefined mean and variance, and is notoriously difficult to estimate robustly. On the other hand, we have performed a numerical analysis indicating that the KL cost tends to be well correlated with the estimator variance and is therefore useful as a reduction measure. In addition, any threshold on the estimator variance would have to be determined empirically as well, since there is no direct way to be related to the global solution quality.

Implementation Details
We implemented our algorithm as a plug-in for the Mitsuba [Jak10] rendering framework. The approximation of the incoming illumination is based on the online-learning code provided by Vorba et al. [VKŠ * 14].
BRDF Sampling Probability. The original sampling of Vorba et al. [VKŠ * 14] corresponds to multiple importance sampling between the original BRDF and the incoming illumination GMM with a fixed probability of 0.5. This value prevents them from becoming biased (which can be caused by an inaccurate illumination approximation) or from strongly increasing the variance on glossy surfaces. Since our algorithm includes the knowledge of the BRDF via its product with the incoming radiance, we can safely lower this threshold to a marginal safety value of 0.1 (i. e., the product distribution will be used to make 90 % of sampling decisions).
Parallelization. In all our experiments we initially (before reduction) use 8 GMM components to fit both BRDFs and illumination. The GMM parameters are stored in SSE vectors with a width of 4, granting us a four-fold increase in the product evaluation and sampling performance. The parallelized code iterates over the BRDF components (since these tend to have a higher affinity to be reduced to less components given a certain reduction cost threshold), processing four illumination components at the same time. We allow the reduction of BRDF mixtures to freely vary between 1 and 8 components, while the reduction of illumination mixtures can reach only 4 components, or stay at 8 (since, due to our parallelization scheme, the intermediate illumination mixture sizes would not bring any additional speed-up).

Results
This sections presents some results of our algorithm. We first analyze the BRDF GMM fitting, comparing the quality of fits using weighted EM and non-linear optimization with CERES (Sec. 6.1). We then compare the presented product sampling algorithm to Vorba et al. [VKŠ * 14] and other light transport algorithms, followed by a deeper analysis of individual aspects of our method (Sec. 6.2). All simulations were executed on consumer PCs running Linux on Intel Xeon E5-1620 processors (8 cores at 3.6 GHz) and 16 GB RAM.

BRDF Fitting
Sec. 4.1.2 showed that the non-linear optimization can produce better BRDF fits than the weighted EM algorithm, especially when it comes to materials with high roughness and at grazing angles (Fig. 4). However, if the roughness is not high then weighted EM still produces acceptable fits to the BRDF lobe. In our experiments we found out the there is no major difference between both methods when α ≤ 0.2. Therefore, to be closer to a practically realistic use-case, we used weighted EM for our main results presented in Sec. 6.2, because it provides a near-instant fitting (cf. Table 1). Fig. 7 compares the results of the JEWELRY scene, using weighted EM and Ceres for fitting the BRDF caches. The render times for all algorithms were fixed to 1 hour; this includes the illumination cache training times (see Table 1) for the methods that require it (GPT and ours). We use identical illumination caches for GPT and our method, trained with 60M photons and 60M importons spread across 30 training passes. The timings for the BRDF fitting were however counted separately (we treat this step as pre-processing) and are listed in Table 1. For isotropic caches we use N = 512 incident elevation angles, and M = 4069 incident directions for the anisotropic caches. Reference images took approximately two weeks to render with vertex connection and merging [GKDS12]. To compare the sampling quality directly, we also present one scene with equal sampling rates in Fig. 1. Memory consumption and component reduction. Since the 2D GMMs can be stored compactly using 6 floats per Gaussian component, the memory needed to store the BRDF mixtures is negligible (<10 MB even for complex scenes like KITCHEN or LIVINGROOM). We therefore always reserve the space for K = 8 components and use the presented reduction algorithm (Sec. 4.2) only to speed up the computation and sampling of the product mixtures. Sampling overhead. To compare the computational cost of our product sampling to the illumination-only sampling [VKŠ * 14] we measured the average overhead per generated path segment induced by our algorithm. Table 2 (right) summarizes these measurements. Note that on purely diffuse surfaces the BRDF is constant and its product with the illumination mixture is the illumination mixture itself. We therefore switch to pure illumination sampling on diffuse surfaces to avoid wasting computational effort. Interestingly, while the mixture reduction decreases the per-segment sampling time, it does not impact the overall rendering time significantly for our tested scenes. However, given that the product calculation has a quadratic complexity with respect to the factor mixtures, the relative importance of reducing their sizes would increase if we used more than 8 components to fit them.

Product Sampling
Path length analysis. Examining the generated samples per pixel (SPP) for the presented scenes (Fig. 8), it is apparent that they are not directly proportional to the product-sampling overhead (Table 2, right). Our analysis shows that this is caused by the fact that the product sampling generates longer paths compared to the GPT of Vorba et al. [VKŠ * 14]. Specifically, the average path lengths in each scene were (GPT / ours): 9.0 / 6.5 (LIVINGROOM), 4.9 / 9.8 (KITCHEN), 3.7 / 4.9 (JEWELRY), 6.1 / 17.2 (KITCHENETTE). The main reason is the used path termination scheme (see the discussion in Sec. 7), which culls low-throughput paths. Since our product sampling generates paths with higher and more consistent throughput, they tend to stay alive longer. One exception is the LIVINGROOM scene, where all BRDFs have a diffuse component and consequently

GD-PT GD-BDPT Ours Reference
Es�mate Reconstruc�on Es�mate Reconstruc�on Figure 9: Comparison of gradient-domain (GD) methods (PT and BDPT) to our product sampling, after 75 minutes of total rendering time. While incorporating image gradients into the reconstruction removes some variance, features not present in the base estimate still cannot be recovered. In contrast, our product sampling can successfully find even the more difficult paths and yields a much more even convergence.
the paths stay alive longer even for GPT -here we actually generate paths with lengths closer the mean contributing path length and hence also produce more SPP.

Discussion
Extension for other MC algorithms. Since our implementation treats the product sampling as a drop-in replacement for BRDF sampling, combining it with other tracing-based algorithms such as BDPT or gradient-domain PT / BDPT [KMA * , MKA * 15] would be relatively straightforward. Vorba et al.
[VKŠ * 14] already implemented illumination-guided sampling for BDPT -while their guided sampling increased the convergence rate, it still struggles at sampling complex glossy-glossy interactions, which our product sampling handles well. Similarly, while gradient-domain methods can employ image gradients for better image reconstruction, they rely on the underlying baseline algorithms to reliably obtain the path contribution and therefore suffer from their inherent limitations. We demonstrate this in Fig. 9 on the KITCHENETTE scene, where significant path contributions come from multiple consecutive glossy-glossy interactions. Our algorithm increases the robustness of these methods and would directly produce better per-pixel radiance and gradient estimates, which in turn should lead to better reconstruction.
Spatially varying BRDFs. A current limitation of our approach is the handling of SVBRDFs. Due to the compactness of the BRDF GMM caches, pre-sampling a large variety of different parameters for the glossy components of the SVBRDFs is still feasible. However, an ideal solution would be to develop a direct functional transform from the SVBRDF parameters to a GMM representation of the BRDF lobe. We see this as an interesting direction for future work, which could build for instance on [WÅ] or [XSD * 13].
Russian roulette. By design, our product sampling always generates paths with significant throughput. The consequence of terminat-  2) is that we tend to generate long paths, even if their eventual contribution is small. Unfortunately the low threshold is necessary to find some long but significant paths, and making its value higher typically increases the incidence of outlier noise ('fireflies'). We believe this issue could be addressed by employing an adaptive path termination strategy (such as [VK16]), which takes the relationship between the path throughput and its expected contribution into account.
Quality of illumination caches and MIS weight. The illumination caches can sometimes be fitted poorly, which is manifested by discontinuous patches of high variance (e. g., LIVINGROOM, left wall) or different low-variance convergence rates (e. g., JEWELRY, floor). This is a general limitation shared with Vorba et al.
[VKŠ * 14], and we believe it holds potential for further research. For instance, the iterative nature of their online training algorithm could possibly be combined with a component merging and splitting strategy, so that poorly fitted GMMs could be identified and corrected early. Another possible optimization is to integrate the MIS weight for sampling the BRDF into the illumination GMM by adding a special mixture component that would represent 'uncategorized' background radiance. This would prevent the EM algorithm from being distracted by background radiance or variance in the illumination samples.
Mixture reduction. The reduction algorithm adapted from Runnalls [Run07] is robust, and its simplicity helps to keep the computational costs low. Also the fact that it maintains the mean and variance of the entire mixture unchanged (Apx. B) is useful in preventing salient parts of the directional domain from becoming critically under-sampled. Still, more sophisticated decimation approaches could be examined [CWPS11], or even a more general algorithm could be employed by combining the reduction with the possibility of splitting the components, similar to the bottom-up building of GMMs [HH08,SH09].

Conclusion
Product importance sampling can significantly improve the convergence of Monte Carlo global illumination solvers, which we have demonstrated for a guided uni-directional path tracing. This is especially true for difficult scenes-for instance with complex illumination, difficult visibility, or glossy materials-where ignoring some terms of the rendering equation during importance sampling can lead to significant increases of variance. Our main effort in the future will be to improve the adaptivity of the algorithm to even more difficult conditions, but, equally importantly, to recognize and limit its use whenever it can lead to unnecessary overhead.