Surface Telerobotics Mission with Astronaut Luca Parmitano

Surface Telerobotics Mission, Session 2, took place on July 26th. Astronaut, Luca Parmitano, controlled our K10 Rover on Earth from the International Space Station. You can see the software we built to control the rover in the below pictures ­čÖé

There were lot of media present on the day of the session for coverage, including WIRED,, etc.


Global News:

Astronauts control rover from space


Surface Telerobotics Mission

So much happened since my last post. I marked my 1 year work-anniversary at NASA on May 28th, 2012. I produced three short films, two of which is related to NASA. One of those videos was shown to the astronaut in the International Space Station, as part of training for our mission:

On June 17th 2012, we got to interface with the astronaut on board the ISS as part of our Surface Telerobotics Project. The day started at the crack of down (5am!). There were thankfully no major hiccups and we were able to successfully have the astronaut in space control the K10 rover on Earth! The astronaut who controlled our rover was Chris Cassidy. He’s a Navy SEAL, an astronaut, and an engineer from MIT, which officially makes him one of the coolest astronauts ever ­čÖé When the session ended, he had only good things to say about the GUI and complimented us on intuitiveness of the controls and the smoothness of the operation. And this of course, made the day for all of us who worked on the project.

Here are high-res pictures sent from the ISS of the astronaut in ISS using our UI to control our rover on the ground:

Astronaut Chris Cassidy controls the rover from ISS
Image Courtesy of Intelligent Robotics Group at NASA Ames Research Center
Image Courtesy of Intelligent Robotics Group at NASA Ames Research Center
Image Courtesy of Intelligent Robotics Group at NASA Ames Research Center

Factory Method (and other Design Patterns)

While talking about a method inside Verve (3D simulator for robotic control), my teammate said that it was similar to a “Factory”. At first, I thought he was making an analogy between an actual factory and the method. But it turned out that he was referring to a pattern called “Factory Method”, which was of course outlined in the seminal book “Design Patterns” by Gamma that I conveniently forgot to read after impulsively buying it from Amazon.

Factory Method is a design pattern in object oriented program that allows an instantiation of an object in the subclass instead of the superclass. The superclass would be an interface that outlines the objects that need to be implemented. It is the job of a subclass that implements the interface to instantiate these objects outlined in the interface.

Here’s an example. Say you have a Shape super class. Inside the Shape’s constructor, it calls a “makeShape” function, which is supposed to instantiate and return a Shape. But the program won’t know which shape to make (in another words, “instantiate”) because the type of shape is defined in the subclasses. So the job of instantiation is passed to the subclasses of Shape super class, such as Circle. Circle Subclass would implement the “makeShape” method that instantiates the shape and returns it.

There is another pattern that is similar to Factory Method but does not instantiate each time but rather passes back the same object that was instantiated once for that class. It’s called a “Singleton”.

Geometry Clipmaps

Image courtesy of Losasso & Hoppe.

I am currently reading a paper called “Geometry Ciipmaps: Terrain Rendering Using Nested Grids” by Frank Losasso and Hughes Hoppe, for my next project in IRG for terrain-rendering related project.

What is a Geometry Clipmap: 

It caches the terrain in a set of nested regular grids (which are filtered version of the terrain at power of two resolutions) centered about the viewer and it’s incrementally updated with new data as the viewer moves.

What is the difference between Mipmaps / Texture Clipmaps / Geometry Clipmaps ? 

A mipmap is a pyramid of images of the same scene. The set of images goes from fine to coarse, where the highest mipmap level consists of just one pixel. Mipmap level rendered at a pixel is a function of screen space parametric derivatives, which depend on view parameters and not on the content of the image.

A texture clipmap caches a view dependent subset of a mipmap pyramid. ┬áThe visible subset of the lower mip levels is limited by the resolution of the display. So it’s not necessary to keep the entire texture in memory. So you “clip” the mipmap to only the region needed to render the scene.┬áTexture clipmaps compute LOD (level of detail) per-pixel based on screen space projected geometry.

With terrains, geometry for screen space does not exist until the level of detail is selected. But texture clipmaps compute LOD per-pixel based on existing geometry. See the problem?

So geometry clipmaps selects the LOD in world space based on viewer distance. It does this by using set of rectangular regions about the view point and uses transition regions to blend between LOD levels.

Refinement Hierarchy

Geometry clipmap’s refinement hierarchy is based on viewer centric grids but ignores local surface geometry.

Overview of Geometry Clipmap

Consists of m levels of terrain pyramid. Each level contains n x n array of vertices, stored as vertex buffer in video memory. Each vertex contains (x,y,z,z_c) coordinates, where z_c indicates height value at (x,y) in the next coarser level (for transition morphing).


Each clipmap level contains associated texture image(s), which are stored as 8-bit per channel normal map for surface shading (more efficient than storing per-vertex normals. The normal map is computed from the geometry whenever the clipmap is updated.

Per-frame Algorithm

– determine the desired active regions (extent we wish to render)

-update the geometry clipmap

-crop the active regions to the clip regions (world extent of nxn grid of data stored at that level), and render.

Computing Desired Active Regions

Approximate screen-space triangle size s in pixels is given by

W is the window size

phi is the filed of view

If W = 640 pixels, phi = 90 degrees, we obtain good results with clipmap size n=255.

Normal maps are stored at twice the resolution, which ives 1.5 pixels per texture sample.

Geometry Clipmap Update

Instead of copying over the old data when shifting a level, we fill the newly exposed L-shaped region (since the texture look up wraps around using mod operations on x and y). The new data comes from either decompressing the terrain (for coarser levels), or synthesizing (for finer levels). The finer level texture is synthesized from the coarser one using interpolatory subdivision scheme.

Constraints on the clipmap regions

– clip regions are nested fro coarse-to-fine geometry prediction. Prediction requires maintaining one grid unit on all sides.

– rendered data (active region) is subset of data present in clipmap (clip region).

– perimeter of active region must lie on even vertices for watertight boundary with coarser level.

– render region (active region) must be at least two grid units wide to allow a continuous transition between levels).

Rendering the Geometry Clipmap

//crop the active regions
for each level l = 1:m in coarse-to-fine order:
Crop active_region(l) to clip_region(l)
Crop active_region(l) to active_region(l-1)
//Render all levels
for each level l=1:m in fine-to-coarse order:
Render_region(l) = active_region(l) - active_region(l+1)
Render render_region(l)

Transition Regions for Visual Continuity

In order to render regions at different levels of detail, the geometry near the boundary is morphed for each render region s.t. it transitions to geom of coarser level.

Morphed elevation (z’):

where blend parameter alpha is computed as alpha = max(alpha_x, alpha_y).

v’_x denotes continuous coordinates of the viewpoint in the grid of clip region.

x_min and x_max are the integer extents of the active_region(l).

Texture Mapping

Each clipmap level stores texture images for use in rasterization (i.e. a normal map). Mipmapping is disabled but LOD is performed on the texture using the same spatial transition regions applied to the geometry. So texture LOD is based on viewer distance rather than on screen-space derivatives (which is how hardware mipmapping works).

View-Frustrum Culling

For each level of clipmap, maintain z_min, z_max bounds for the local terrain. Each render region is partitioned into four rectangular regions (see Figure 6). Each rectangular region is extruded by zmin and zmax (the terrain bounds) to form a axis-aligned bounding box. The bounding boxes are intersected with the four-sided viewing frustum and the resulting convex set is projected onto an XY plane. The axis aligned rectangle bounding this set is used to crop the given rectangular region.

Terrain Compression

Create a terrain pyramid by downsampling from fine terrain to coarse terrain using a linear filter. Then reconstruct the levels in coarse-to-fine order using interpolatory subdivision from next coarser level  + residual.

Terrain Synthesis

Fractal noise displacement is done by adding uncorrelated Gaussian noise to the upsampled coarser terrain. The Gaussian noise values are precomputed and stored in a look up table for efficiency.

Here are some good references:

A good follow up paper:

Awesome website on terrain rendering:

Brushing up on Probabilities, Localization, and Gaussian

Gaussian: It is a bell curve characterized by mean and variance. It’s is unimodal and symmetric. The area under the Gaussian adds up to 1.

Variance: measure of uncertainty. Large covariance = more spread = more uncertain.

Bayes Rule


Involves “move” (motion) step and “sense” (measurement) step.

Motion (move) : First the robot moves. We use convolution to get the probability that robot moved to the current grid location. We use Bayes Rule (given previous location, find probability of being in this current grid location).

Measurement (sense) : Then robot senses the environment. We use products to get the probability that the sensor measurement is correct.┬áMeasurement applies theorem of total probability (sum of: ┬áprobability that sensor measurement is correct given it’s a hit, prob that sensor is correct given it’s a miss).

*Side note: for grid based localization method (histogram method), the memory increases exponentially with number of state variables (x,y,z, theta,row, pitch, yaw, etc)