Scene Graph, Weak References and other Java Stuff.

Thanks to VERVE, a robot simulation platform running on eclipse, my knowledge of Java is increasing daily. Currently I am going through the VERVE code that deals with importing and parsing KML overlays (Google Earth File Format), and in the process I encountered the concept Scene Graph, which is how Ardor3D (the graphics library we use for VERVE) is structured.

Scene graph is a directed acyclic graph (DAG) arranged in a tree structure that is composed of nodes. The objects are stored in the leaf nodes of the tree and the non-leaf (group) nodes are there to group the leaf nodes together.

The nodes in the VERVE scene graph are referred using weak references.

Weak References are references to objects that don’t interfere with garbage collection. Normally, the objects are referenced using strong references:

StringBuffer buffer = new StringBuffer();

But in some cases, strong references causes memory leaks since it forces an object to remain in memory if it is reachable via chain of strong references.

Weak references don’t force the object to remain in memory. They are created like this:

WeakReference<Widget> weakWidget = new WeakReference<Widget>(widget);

Sometimes garbage collector might throw out the weak referenced object before you ask for the object. And when you do ask, you might get back “null”. To prevent this, you can use a weak hashmap, which has keys that are referred to with weak references. Weak Hashmap automatically removes the entry if key becomes no longer reachable.

Here’s an awesome blog post where I read about Weak References:  weblogs.java.net/blog/2006/05/04/understanding-weak-referencesf

Factory Method (and other Design Patterns)

While talking about a method inside Verve (3D simulator for robotic control), my teammate said that it was similar to a “Factory”. At first, I thought he was making an analogy between an actual factory and the method. But it turned out that he was referring to a pattern called “Factory Method”, which was of course outlined in the seminal book “Design Patterns” by Gamma et.al that I conveniently forgot to read after impulsively buying it from Amazon.

Factory Method is a design pattern in object oriented program that allows an instantiation of an object in the subclass instead of the superclass. The superclass would be an interface that outlines the objects that need to be implemented. It is the job of a subclass that implements the interface to instantiate these objects outlined in the interface.

Here’s an example. Say you have a Shape super class. Inside the Shape’s constructor, it calls a “makeShape” function, which is supposed to instantiate and return a Shape. But the program won’t know which shape to make (in another words, “instantiate”) because the type of shape is defined in the subclasses. So the job of instantiation is passed to the subclasses of Shape super class, such as Circle. Circle Subclass would implement the “makeShape” method that instantiates the shape and returns it.

There is another pattern that is similar to Factory Method but does not instantiate each time but rather passes back the same object that was instantiated once for that class. It’s called a “Singleton”.

SIGGRAPH 2012 Recap

SIGGRAPH [Sig-Graph]

Definition: Name of the annual conference on computer graphics (CG) convened by the ACM SIGGRAPH organization. Dozens of research papers are presented each year, and SIGGRAPH is widely considered the most prestigious forum for the publication of computer graphics research.

Our paper, Updated Sparse Cholesky Factors for Co-Rotational Elastodynamic, was presented at SIGGRAPH. Here’s a picture of me and Florian (first author) doing a silly 40-sec advertisement of our session during Tech Papers Fast Forward.

SIGGRAPH Fast ForwardSIGGRAPH ends up being a big re-union of friends since graphics industry/academia is so tightly knit. The left half of us met at PIXAR during the Summer Internship. Lot of us ended up in the visual effects studios including Pixar, Weta, and Bungie and we started our new jobs within few months of each other. The right half are my friends from Berkeley Graphics Lab. Everyone loves technology, art, and being a nerd. I loved spending time with these like-minded people. 

Curiosity landed on Mars succesfully on the first day of SIGGRAPH. There was a viewing session set up at the Geek Bar in Los Angeles Convention Center. A fair amount of people came, including my family who snuck in to the geek bar to watch the landing with me. I believe my mom cried during the landing. It was so emotional and I felt so proud to be an engineer.

And I got some quality time with my family and my parents’ not-so-small-anymore pug. It looks just like Frank the Pug from Men in Black. This pug changed our family dynamic from a grouchy family to a loving family. It loves everyone it meets. Here’s me pinching its velvet-pin-cushion cheeks.

One of the best things about SIGGRAPH is the screening of selection from the SIGGRAPH Animation Festival. I got to watch Disney’s new short Paperman, which is screened in front of Wreak it Ralph. The style of it was an interesting blend of 2D and 3D framed in photographic black-and-white. I watched it three times, and during the third one I teared up a bit. It moved me in the best way. Definitely a dreamer and romantic’s pick of the evening.

I also loved the style of Twinings commercial called “Gets You Back To You”. My favorite part is when her feet plunges in to the shallow ocean (44 seconds). Makes me want to walk along the shores of Seal Beach before I head back to Silicon Valley.

During downtime, I wandered into the SIGGRAPH Bookstore. It’s awesome because someone already did the work for you in selecting the books in your interest range. I ended up buying bunch of them:

Herb Sutter, Andrei Alexandrescu
I heard of this book through an amazing coder I programmed with at Berkeley. He was a C++ guru and when I asked him how he picked up C++ (Berkeley doesn’t teach it to you) he mentioned that he read a red C++ book over Summer. I assume it’s this one since it’s one of the more well known red-books on C++.

The DSLR Filmmaker’s Handbook: Real-World Production Techniques

Barry Andersson, Janie L. Geyen
I am considering purchasing a DSLR and shooting videos in my spare time (weekends tend to be open). Reading a book about production usually doesn’t help with the process unless you absolutely have no clue as to where to start. Well, I have zero knowledge and any starting point will probably help.
An Interdisciplinary Introduction to Image Processing: Pixels, Numbers, and Programs

Steven L. Tanimoto
Can’t wait to read this book. It looks at image processing from creative and artistic as well as technical point of view. 
My friend Josh bought this book. I picked it up again at the store and read couple pages and realized that it’s directly related to projects done by my team at NASA. It covers terrain reconstruction among other topics. I can’t wait to read this one and hopefully implement it for planetary bodies. 
More detailed blog on technical talks at SIGGRAPH will be coming soon, including the ones on Mobile GPUs. So stay tuned…

Brushing up on Linear Algebra

Hermitian Matrix

Square matrix with complex entries that is equal to its own conjugate transpose.

i.e.

[3 2+i; 2-i 1]

Positive Definite Matrix

A n x n real matrix M is positive definite if z’Mz > 0 for all non-zero vector z with real # entries.

z*Mz > 0 (for complex or Hermitian Matrix M)

example:

[z_0 z_1] [1 0; 0 1] [z_0; z_1]  = z_0^2 z_1^2

Therefore, [1 0; 0 1] is positive definite

Eigenvalues

Non-zero vectors that remain parallel to the original vector no matter what matrix (read: transformation) is applied to them.

Av = lamba *v, where lambda is the eigen value of A corresponding to v.

Cholesky Decomposition

Decomposition of a Hermitian, positive-definite matrix into product of lower triangular matrix and its conjugate transpose (take the transpose then negate imaginary parts but not real part). Analogous to taking a square root of a number.

A = LL*, where L is a lower triangular matrix with positive diagonal entries.

Writing Efficient Code for OpenCL Applications

This is the part 2 of the OpenCL Webinar Series put out by Intel. There were some good information about optimization in general but lot of the information focused on how to optimize OpenCL running on 3rd Gen Intel Core Processor. I jot down some optimization notes that are applicable to all processors.

Avoid Invariant in OpenCL Kernel
Anything that’s independent from kernel execution (invariants, constants), move it to the host.

Bad:
__kernel void test1 (__global int* data, int2 size, int base) {
       int offset = size.y * base + size.x;
       float offsetF = (float)offset;
       …
}

Good:
__kernel void test1 (__global int* data, int offset, float offset) {
      …
}

Avoid initialization in Kernel
Move one time initialization to the host or in a different kernel.

Bad:
__kernel void something (__global int* data) {
         size_t tid = get_global_id(0)
         if (0== tid) {
                 //Do Something
         }
        barrier(CLK_GLOBAL_MEM_FENCE)
}

Use the Built in Functions
i.e. dot, hypot, clamp, etc.

Trade Accuracy vs Speed
If the output is correct, look at the performance and use “mad” or “native_sin”

Get rid of edge conditions

Bad:
__kernel void myKernel(__global int* data, int maxR, int maxC){
         int row = get_global_id(0);
         int col = get_global_id(1);

         if (row > maxRow){
                 …
         }
         else if (col > maxCol) {
                  …
         }
         else{
                  …
         }
}

Use the ND Range (work within a smaller range. Just avoid the conditionals…)
Use the padded buffers…

Get rid of edge conditions in general
Use logical ops instead of comparison/conditions.

Reduce number of registers to increase parallelism

Avoid Byte/Short Load and Stores
Use load and store in a greater chunk.
i.e. uchar -> uint4

Don’t use too many barriers in kernel
You are asking everything to wait until the work items are done.
If the work item doesn’t run, the code may hang

Use the vector types (float8, double4, int4, etc) for better performance on the CPU

Preferred work group size for kernels is 64 or 128 work items.
-Workgroup size multiple of 8
-Query
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE parameter by calling to clGetKernelWorkgroupInfo.
-Match number of workgroups to logical cores.

Buffer Object vs. Image Objects
CPU Device:
– Avoid using Image Object and use Buffer object instead.

Intel SDK for OpenCL Applications Webinar Series 2012

Intel hosted a webinar on running OpenCL on Intel Core processor. The webinar I attended this morning (9am, July 11th), is first part of the three-part webinars on this topic. It was well organized and educational and I think the next seminar will be even more useful (since it deals with programming using OpenCL. I took notes during the webinar to get you up to speed in case you want to attend the next two seminars.

* July 18-Writing Efficient Code for OpenCL Applications<http://link.software-dispatch.intel.com/u.d?V4GtisPHZ8Stq7_bNj1hJ=3231> 
* July 25-Creating and Optimizing OpenCL Applications<http://link.software-dispatch.intel.com/u.d?K4GtisPHZ8Stq7_bNj1hS=3241> 

OpenCL: Allows us to swap out loops with kernels for parallel processing.

Introduction: Intel’s 3rd Generation Core Processor.

  • Inter-operability between CPUs and HD Graphics.
  • Device 1: maps to four cores of intel processor (CPUs)
  • Device 2: Intel HD Graphics.
  • Allows access to all compute units available within system (unified compute model – CPU and HD Graphics)
  • Good for multiple socket cpu – if you want to divide the openCL code with underlying memory architecture.
  • Supported on Window7 and Linux.

General Electric’s use of OpenCL

  • GE uses OpenCL for image reconstruction for medical imaging (O(n^3) – O(n^4))
  • Need unified programming model for CPUs and GPUs
  • OpenCL is most flexible (across all CPU and GPUs) – good candidate for unified programming language.
  • Functional Portability: take OpenCL application and run it on multiple hardware platforms and expect it to produce correct results.
  • Performance Portability: functional Portability + Deliver performance close to entitlement performance (10-20%)
  • Partial Portability: functional Portability + only host code tuning is required.
  • Benefits of OpenCL:
    • C like language – low learning curve
    • easy abstraction of host code (developers focus on kernel only)
    • easy platform abstraction (don’t need to decide platform right away.)
    • development resource versatility (suitable for mult. platforms)
  • Uses combination of buffers (image buffers and their customized ones). Image buffers allow them to use unique part of GPU.
  • Awesome chart that compares various programming models:

Image courtesy of Intel SDK for OpenCL Webinar.

Intel OpenCL SDK: interoperable with Intel Media SDK with no copy overhead on Intel HD Graphics.

Intel Media SDK: hardware accelerated video encode/decode and predefined set of pre-processing filters

Thank you UC Berkeley Visual Computing Center for letting me know about this webinar series!