Learning Neural Light Fields with Ray-Space Embedding

本文使用的是4D LF表达方式。可直接从(x,y,u,v)得到color+density，而不用像NeRF那样需要hundreds of sampled point。

注：一定要看本文的Supplementary！！！

本文使用NeRF-like方法进行光场重建，如5x5→17x17（本文的提升不大，只是比较均衡，不知道为啥能发CVPR）。NeRF和LF的结合。

NeRF能实现SOTA的NVS，但渲染慢，用显示的数据结构（如voxel）表示scene又会导致large memory。本文与prior work相比，faster, memory efficient, 能够更好处理view-dependent效果。

本文的核心是：将4D ray-space（即 $(x,y,u,v)$ 形式）映射到一个过渡的、可解释的潜在空间。

LF直接将ray parameter映射到沿ray的辐射积分integrated radiance，即 $L(r(x,y,u,v))=rgb+{\delta}$ 。因此，确定一条光线的颜色时只需查询representation一次，而不像NeRF那样需要数以百次（一条ray采样了很多点）。

基于上面的idea以及NeRF的pipeline，我们很自然地想到用上图(a)中的方式，对4D ray-space中的 $r(x,y,u,v)$ 进行位置编码后送入LF network（就是MLP，用来implicit表示scene），得到color+density，最终使用volume rendering得到pixel的color。表示为：

\Large L_{base}(\pmb{r})=F_{\pmb{\theta}}(\gamma(\pmb{r})) \\ \Large \pmb{r} = (u,v,x,y)

但是，会存在以下挑战：

对于3D scene空间中的同一点，其在4D ray-space中可能有多个对应点，如 $(u_1,v_1,x_1,y_1),...,(u_n,v_n,x_n,y_n)$ ，这些点只在训练数据中出现一次；（没太明白有什么缺点）
LF没有显示表达3D scene geometry，因此，一个不知道priori的网络是不知道如何对训练集中没有出现的rays的color进行插值；或者说，query unseen ray coordinates时，multi-view consistency是无法保证的；

因此本文提出两个关键技术：ray-space embedding and subdivision

Ray-space embedding其实就是将4D ray-space重新映射到一个latent space，在此latent space中：

Memorization：将同一3D scene point的不同4D ray-sapce coordinates映射到latent space的相同位置。解决挑战1；
Interpolation：该latent space是interpolable latent space，能够保证multi-view consistency；

表示如下：

\Large \pmb{r_1} = E_{\pmb{\phi}}^{feat}(\pmb{r}) \\ \Large L_{feat}(\pmb{r}) = F_{\pmb{\theta}}(\gamma(\pmb{r_1}))

此方法虽然能够压缩并且帮助找到ray space中的相关性，但还不够（能够处理dense LFs，但对sparse LFs还是比较吃力，于是文中又提出local light fields）。利用LF Re-parameterization，如下：

有什么用呢？文中说相比于全局LF，local LF parameterization更加容易学习

local LF不太容易发生大的深度和纹理变化；

Re-parameterizatoin也就是subdivided，即

最终模型：

公式如下：

\Large (A_i, \pmb{b_i}) = E_{\pmb{\phi}}(\pmb{r_i},\gamma(i)) \\ \Large (c_i, \alpha_i) = F_{\pmb{\theta}}(\gamma(A_ir_i+b_i),\gamma(i))

final color of ray r：

\Large \pmb{c} = \sum_{i{\in}V(r)}(\prod_{j{\in}v(r) \&j<i}(1-{\alpha_j}))\alpha_ic_i

学习理解

结合上面的内容以及论文Introduction结尾部分的contributions，可以知道：

为了解决NeRF数以百次的MLP evaluation，本文提出neural light field，只需一次MLP evaluation，即： $\pmb{r}(x,y,u,v){\rightarrow}(rgb,\delta)$ $r r r (x, y, u, v) \to (r g b, δ)$ 。但LF representation存在问题：
1. 3D scene空间中同一点对应4D ray-space中多个不同的coordinate，但这些coordinates在训练集中只出现一次；
2. LF representation无explicit geometry，导致无法保证multi-view consistency；
因此，在neural light filed的基础上，提出ray-space embedding，即将4D ray-space re-map to latent space，该空间有如下两个特点：memorization、interpolation，具体见上文；
ray-space embedding虽然能够压缩并且帮助找到ray-space中的相关性，但还不够。于是论文根据Figure 3. Axis-aligned ray-space的分析，进一步提出Local affine transformation-based embedding，这一步实则是将effective interpolation task转为learning an optimal re-parameterization of light field, such that 2D color level sets(e.g., the line structures in the $ux$ slices) for each 3D scene point are axis-aligned(See the appendix for a more extensive discussion).
Local Affine-Transformation Embedding能够很好解决dense light fields，但对sparse light fields比较吃力。因此，本文又进一步提出Subdivided Neural Light Fields, that is learning a voxel grid of local fields that covers the entire 3D scene (Figure 4.). 还是基于Figure 3.所得出的结论，为了让color level sets of light field are axis-aligned.

Limitations & future work

LF parameterized with two planes不能encompass 360° scenes。improving the design of embedding networks for different parameterizations could lead to a larger boost in performance.（可以用球面法吗？但球面法如何进行与本文相似的local affine？）
subdivision for sparse LF会增加render time and training time，Adaptive subdivision [23, 26] is another interesting direction for future work which could lead to better quality without sacrificing speed.

Supplementary

Importance of Light Field Parameterization

没理解

Learning Neural Light Fields with Ray-Space Embedding

学习理解

Limitations & future work

Supplementary

Importance of Light Field Parameterization

感谢您的支持，我会继续努力的!