Intel SDK for OpenCL Applications Webinar Series 2012

Intel hosted a webinar on running OpenCL on Intel Core processor. The webinar I attended this morning (9am, July 11th), is first part of the three-part webinars on this topic. It was well organized and educational and I think the next seminar will be even more useful (since it deals with programming using OpenCL. I took notes during the webinar to get you up to speed in case you want to attend the next two seminars.

* July 18-Writing Efficient Code for OpenCL Applications<http://link.software-dispatch.intel.com/u.d?V4GtisPHZ8Stq7_bNj1hJ=3231> 
* July 25-Creating and Optimizing OpenCL Applications<http://link.software-dispatch.intel.com/u.d?K4GtisPHZ8Stq7_bNj1hS=3241> 

OpenCL: Allows us to swap out loops with kernels for parallel processing.

Introduction: Intel’s 3rd Generation Core Processor.

  • Inter-operability between CPUs and HD Graphics.
  • Device 1: maps to four cores of intel processor (CPUs)
  • Device 2: Intel HD Graphics.
  • Allows access to all compute units available within system (unified compute model – CPU and HD Graphics)
  • Good for multiple socket cpu – if you want to divide the openCL code with underlying memory architecture.
  • Supported on Window7 and Linux.

General Electric’s use of OpenCL

  • GE uses OpenCL for image reconstruction for medical imaging (O(n^3) – O(n^4))
  • Need unified programming model for CPUs and GPUs
  • OpenCL is most flexible (across all CPU and GPUs) – good candidate for unified programming language.
  • Functional Portability: take OpenCL application and run it on multiple hardware platforms and expect it to produce correct results.
  • Performance Portability: functional Portability + Deliver performance close to entitlement performance (10-20%)
  • Partial Portability: functional Portability + only host code tuning is required.
  • Benefits of OpenCL:
    • C like language – low learning curve
    • easy abstraction of host code (developers focus on kernel only)
    • easy platform abstraction (don’t need to decide platform right away.)
    • development resource versatility (suitable for mult. platforms)
  • Uses combination of buffers (image buffers and their customized ones). Image buffers allow them to use unique part of GPU.
  • Awesome chart that compares various programming models:

Image courtesy of Intel SDK for OpenCL Webinar.

Intel OpenCL SDK: interoperable with Intel Media SDK with no copy overhead on Intel HD Graphics.

Intel Media SDK: hardware accelerated video encode/decode and predefined set of pre-processing filters

Thank you UC Berkeley Visual Computing Center for letting me know about this webinar series!

AI Course on Udacity

Image

I’ve been taking an artificial intelligence course on Udacity (http://www.udacity.com/courses), an online course taught by Sebastian Thrun. The course is called “Programming a Robotic Car”. One of my co-workers pointed out that the course covers exactly what I need to learn – probabilities, Kalman Filter, Particle Filter, and SLAM. I will be blogging about my progress with the course and the insights I picked up from it.

Unlike OpenCourseWare from MIT and other webcasts offered, Udacity is much more interactive. I didn’t find myself bored or distracted (though I’m taking a break right now to write this blog) because it has short quizzes (they are easy and very short) to recap the concepts covered in the video. The videos also focus on insight and doesn’t dwell on the mathematical formulation of the problem unless it’s absolutely necessary. And as a visual learner, I find Sebastian Thrun’s drawings very helpful in understanding the concept.

I wish every web classes offered online were as good as these. I hope you find them useful.

Understanding FastSLAM

The SLAM I’m talking about has nothing to do with poetry or basketball. I’m “investigating” (read “learning on the fly”) the SLAM algorithm (Simultaneous Localization and Mapping). One of my co-workers forwarded me two papers that I should read (http://tinyurl.com/6vfvuxg), both of which are co-authored by Sebastian Thrun of Google X (clearly I am very excited to point this out). I think it’s pretty awesome that reading research papers is part of my job.

To understand FastSLAM (version of SLAM in the papers), I needed to understand particle filter and Kalman Filter. Here are one sentence summaries based on wikipedia articles:

particle filter: Uses differently weighted samples of distribution to determine probability of an ‘event happening’ (some hidden parameter) at a specific time given all observations up to that time.

*note to self: similar to importance sampling: particle filter is more flexible for dynamic models that are non-linear.

Kalman Filter:Takes in a noisy input and using various  measurements (from sensor, ctrl input, things known from physics), recursively updates the estimates (they call it system’s state) to be more accurate. example: A truck has a GPS that estimates the position within few meters.  Estimate is noisy but we can take into account the speed and direction over time (via wheel revolution and angle of steering wheel) to update the estimated position to be more accurate.

*note to self: Kalman Filter assumes linearity in dynamics and in noise.

In terms of flexibility, it can be described this way (from least flexible to most):

Kalman Filter < Exteneded Kalman Filter < Particle Filter

FastSLAM is a Bayesian formulation. It essentially boils down to this:


The particle filter is used to estimate the path of the robot (it’s given by the posterior probability p(s_t | z_t, u_t, n_t)). First, construct a temporary set of particles from robot’s previous position and the control input. Then sample from this set with probability of importance factor (particle’s weight). Finding weight of each particle is quite involved. I’ll let you refer to the actual papers for the derivation.

After we have path estimates, we can solve for landmark location estimates (the right side of the equation). Through series of equalities, authors arrive at:


FastSLAM updates the above equation using the Kalman Filter.

The main advantages of FastSLAM are that it runs at O(M log K) instead of O(MK), where M is number of particles, K is number of landmarks. I’ve had trouble understanding this part but here it goes: each particle contains the estimates of K landmarks (and each estimate is a Gaussian). Resampling particles requires copying the data inside the particle (K Gaussians if we have K landmarks). Instead of copying over all K landmark location estimates, FastSLAM does a partial copy for only Gaussians that need to be updated. Also the conditional independence between landmark location and robot location allows for easy setup for parallel computing.

Stay tuned for breakdown of FastSLAM2.0 …