Saturday, February 14, 2015

Out-of-Core Octree Management

The sparse octree used to represent a reconstructed 3D map will quickly grow too large to fit entirely in GPU memory. Reconstructing a normal office room at 1 cm resolution will likely take as much as 6-8 GB, but the Nvidia GTX 770 that I am using has only 2 GB.

To handle this, I have developed an out-of-core memory management framework for the octree. At first glance, this framework is a standard stack-based octree on the CPU. However, each node in the tree has an additional boolean flag indicating whether the node is a subtree that is located in linear GPU memory. It also holds a pointer to its location on the GPU as well as its size. The stackless octree data is represented by 64-bits per node, using the same format as GigaVoxels. Here is a summary of the OctreeNode class's data elements:

class OctreeNode {
  //Flag whether the node is at max resolution,
  //in which case it will never be subdivided
  bool is_max_depth_;

  //Flag whether the node's children have been initialized
  bool has_children_;

  //Flag whether the node's data is on the GPU
  bool on_gpu_;

  //Pointer to gpu data, if the node is GPU backed
  int* gpu_data_;

  //The number of children on the GPU
  int gpu_size_;

  //Child nodes if it is CPU backed
  OctreeNode* children_[8];

  //Data in the node if it is CPU backed
  int data_;

Next, I gave these node's an API that can push/pull the data to and from the GPU. The push method uses recursion to convert the stack-based data into a linear array in CPU memory, then copies the memory to the GPU. It avoids the need to over allocate or reallocate the size of the linear memory by first recursing through the node's children to determine the size of the subtree. The pull method copies the linear memory back to the CPU, then uses it to recursively generate it as a stack-based structure.

It's worth mentioning that it is preferred for the data to reside on the GPU, as all of the update and rendering passes are going to involve parallel operations with data that is also in GPU memory. We only want to pull subtrees to the CPU when we run low on available GPU memory. To do this, I added a GPU occupancy count for the octree as a whole. When this exceeds a fraction of available memory, subtrees of the GPU memory need to be pulled back. 

I am working on a Least Recently Used (LRU) approach where all methods operating on the tree must input an associated bounding box of the area that they will affect. First, this allows us to make sure that the entire affected volume is currently on the GPU before attempting to perform the operation. The octree will also keep a history of the N most recently used bounding boxes. When space needs to be freed, it will take the union of these stored bounding boxes and pull data that lies outside of this region back to the CPU.

This initial approach may need to be improved in the future. For one thing, our use case involves two independent camera poses, one for updating the map and one for rendering it. The bounding boxes associated with these two cameras can be separated spatially, but the method will create a single bounding box that will also encompass the space between them. A more advanced method would first cluster the bounding boxes, and then perform a union operation on each cluster. Another issue is that this method will create a tight box around the cameras. If they are moving, it is possible that they will quickly move outside of the bounding box and require memory to be pulled back. One way to handle this would be to predict the future motion of the cameras.

No comments:

Post a Comment