Sunday, March 29, 2015

Performance Enhancement of Octree Mapping


Key Resolution

Although the sparse octree does not allocate every node of the tree in memory, for unique identification of voxels I have been using Morton Codes to identify them.. Here is an example code:
1 001 010 111
The code starts with a leading 1 to identify the length of the code, and thus the depth in the tree. After that, the code is made up of a series of 3-bit tuples that indicate a high or low value on the binary split of the x, y, and z dimensions respectively.

Using a 32-bit integer, this can represent 10 levels of depth in the tree. However, this is insufficient for mapping with a Kinect camera. The Kinect has a range of 1-10 meters in depth resolution, and doing a back of the envelope calculation for the horizontal resolution (480 pixels, 5m range, 43 degree field of view) shows that the camera will typically provide sub-centimeter resolution. A 10 meter edge volume can only achieve 1 cm resolution using 10 levels of depth. Therefore, I have transitioned to representing these keys with long integers (64-bit), which could represent more than kilometers of volume at millimeter precision, if needed.

Here is a look at the enhanced resolution that can now be achieved. The first is an older image using ~2.5 cm resolution, and the second is enhanced to 4 mm resolution. This level of detail adequately captures the data provided by the Kinect camera.



 
Key Sorting

Another modification that I made to use of these keys is the ordering of the 3-bit tuples. Originally, I found it very useful to represent the octree keys using the most significant tuple on the farthest to the right. The value was that the octree is almost always traversed starting from the root node. The child node from the root could be obtained using the key by performing a bitwise-AND with 0x7. For the next depth, the key could be modified with a shift right by 3 bits and repeating the same process. Max depth of the key is reached when the remaining code is 1.

However, several parts of my algorithms involve getting unique keys from a list, truncated at different levels of depth. Starting with the least significant tuple makes truncation change the ordering of the keys. This means that every call to "unique" must be preceded by a call to "sort." Together, these make up the slowest part of the octree update process. The most straightforward way to improve this was to remove the need to sort repeatedly by switching the order of the tuples. While this is less convenient for octree traversal, I've found it to result in a worthwhile performance gain. Now, getting the most significant tuple from the key involves finding the position of the leading 1, then extracting the following 3 bits. Updating the key for the next depth requires subtracting the leading 1, along with these 3 bits, then adding the leading 1 at 3 bits to the right of its previous location. Although this is messier, the complexity can easily be encapsulated by a function.

The process of traversing the octree to update inner nodes following changes to leaves uses this unique-key paradigm. It starts with a set of keys identifying the updated leaves. Each pass reduces the depth of the keys by a level, and updates the nodes at this higher level from updated child values. However, at higher levels of the tree, the originally unique nodes start to overlap to the same nodes. We benefit from periodically reducing to unique keys, and prefer not to re-sort each time. I've found that for an identical scene and resolution, this process could be reduced from 14-17 ms to 10-12 ms.

Sunday, March 8, 2015

Octree Map Construction


I've now implemented a set of CUDA kernels that can update an octree from an input point cloud. The steps involve:

Update Octree with Point Cloud
1.) Transform the points to map coordinates (based on a camera pose estimate)
2.) Compute the axis-aligned bounding box of the points, using thrust reduction.
3.) Resize the octree if necessary to ensure that it contains all points.
4.) Push the necessary sub-octree to the GPU to ensure that it can be updated and extracted in parallel.
5.) Compute the octree key for each point.
6.) Determine which nodes will need to be subdivided, and how many new nodes will be created.
7.) Create a new node pool that has enough memory to include the new nodes, and copy the old nodes into it.
8.) Now that there is memory available, split the nodes determined in step 6.
9.) Update the nodes with keys from step 5 and the color values from the input cloud.
10.) Continually shift the keys upwards to determine which nodes have modified children, and re-evaluate those nodes by averaging their children.

Extract Voxels from Octree
1.) Compute keys for tree leaves. This involves a parallel pass for each tree depth and a thrust removal step.
2.) Compute the positions and size for each key.
3.) Extract the color of each key from the octree.

At this point, I am only adding points to the map by counting the points as "hit" observations. The complete solution will involve recasting from the camera origin to each point, and using the points along the line as "misses."

Here are a few screenshots of the map rendered with OpenGL, using instanced rendering of cubes with a Texture Buffer Object specifying the cube locations and colors. The first image is using voxels with an edge length of 2.5 cm. The second is the same viewpoint, but instead the octree is only updated at 10 cm resolution.



This is another shot of the same room, though with a different camera angle with both the Kinect and the virtual camera. This shows more detail of objects sitting on a table.


Performance for this process is relatively fast compared to earlier work with camera localization. At this point, I have only a naive implementation of all kernels without any speed optimization. With an NVidia GTX 770 and a Kinect data stream of 640x480 pixels at 30 fps, these are the times for each step:

1.) Update Octree with Point Cloud - 20 ms
2.) Extract Voxels from Octree - 6 ms
3.) Draw Voxels with OpenGL - 2 ms