Training neural radiance fields with bad camera poses is a difficult problem. Conventional NERF pipelines always rely on traditional and expensive SfM algorithms for pose initialization, and a long term goal in computer vision research would be to achieve photorealistic 3D reconstruction and pose estimation on the fly. I, along with a few open-source contributors, implemented a few recent methods that tried to tackle this problem in Nerfstudio, an open source NERF platform. We implemented 3 different methods and they are all available through the Github link. Here I explain some of the methods.
We implemented the first paper that tried to do simultaneous pose estimation and NERF reconstruction: BARF: Bundle-Adjusting Neural Radiance Fields. The paper suggested a coarse-to-fine optimization of a frequency based encoder that would allow gradual scene reconstruction whilst limiting the magitude of backpropogated gradients to the camera extrinsics. Although the method works for a few test scenes, it is quite unusable due to its slow training time. Therefore, I implemented the same coarse-to-fine optimization strategy with the faster Instant-NGP hashgrid architecture. This is achieved by masking the active feature levels based on the equation proposed in BARF (equation 14). I tested my implementation by adding 0.1 STD gaussian noise to initial camera pose translation and rotation and evaluated the reconstruction quality of an optimized NeRF (based on Nerfstudio's Nerfacto model) with this strategy and with naive backpropogated gradients.
Like often in research, things move fast. Shortly after I finished my hashgrid BARF implementation, another paper Robust Camera Pose Refinement for Multi-Resolution Hash Encoding showed up at PMLR. Instead of directly regulating the active levels in a hashgrid, they proposed a coarse-to-fine optimization strategy for the learning rate schedulers of each feature level. The paper showed that this method helped smoothen and converge to better results in optimization. Since the implementation was already very similar to my previous work, we also implemented this new method for comparison. All three methods, named "barf-freq" "barf-hash" "barf-grad" respectively, are available on the github repository. Here are a few training videos.