I've now implemented a set of CUDA kernels that can update an octree from an input point cloud. The steps involve:
Update Octree with Point Cloud
1.) Transform the points to map coordinates (based on a camera pose estimate)
2.) Compute the axis-aligned bounding box of the points, using thrust reduction.
3.) Resize the octree if necessary to ensure that it contains all points.
4.) Push the necessary sub-octree to the GPU to ensure that it can be updated and extracted in parallel.
5.) Compute the octree key for each point.
6.) Determine which nodes will need to be subdivided, and how many new nodes will be created.
7.) Create a new node pool that has enough memory to include the new nodes, and copy the old nodes into it.
8.) Now that there is memory available, split the nodes determined in step 6.
9.) Update the nodes with keys from step 5 and the color values from the input cloud.
10.) Continually shift the keys upwards to determine which nodes have modified children, and re-evaluate those nodes by averaging their children.
Extract Voxels from Octree
1.) Compute keys for tree leaves. This involves a parallel pass for each tree depth and a thrust removal step.
2.) Compute the positions and size for each key.
3.) Extract the color of each key from the octree.
At this point, I am only adding points to the map by counting the points as "hit" observations. The complete solution will involve recasting from the camera origin to each point, and using the points along the line as "misses."
Here are a few screenshots of the map rendered with OpenGL, using instanced rendering of cubes with a Texture Buffer Object specifying the cube locations and colors. The first image is using voxels with an edge length of 2.5 cm. The second is the same viewpoint, but instead the octree is only updated at 10 cm resolution.
This is another shot of the same room, though with a different camera angle with both the Kinect and the virtual camera. This shows more detail of objects sitting on a table.
Performance for this process is relatively fast compared to earlier work with camera localization. At this point, I have only a naive implementation of all kernels without any speed optimization. With an NVidia GTX 770 and a Kinect data stream of 640x480 pixels at 30 fps, these are the times for each step:
1.) Update Octree with Point Cloud - 20 ms
2.) Extract Voxels from Octree - 6 ms
3.) Draw Voxels with OpenGL - 2 ms