Kinect Stereo Calibration


After looking at what camera calibration and stereo calibration theoretically.. this post introduces you to the world of calibrating the kinect. Kincet has an IR camera pair and an RGB camera. How does the IR camera works has already been discussed and explained in detail in lot of posts not only by me but lot of other people also. Thus getting into the topic, kinect calibration literary means to get a relation between the co-ordinates of depth camera and co-ordinates of the RGB camera. This can also be said as Kinect Stereo Calibration as we are calibrating two camera against each other.In short, the output of this calibration will be an image that will be an RGB-D image, or the Image will be an overlap of RGB and Depth Streams.

For this it is necessary to get the internal characteristics of  both the IR and RGB camera. You can either use the Calibration API from OpenCv or use the Matlab Calibration ToolKit to find the internal Parameters. I have used Matlab Calibration Toolkit since it is easy to use and more reliable. Following are the steps for kinect stereo calibration

  • RGB with the chess board as the tool box uses Tsai calibration technique
  • IR images that is obtained using kinect contains granules i.e the receiver image that is contains a pattern of IR Dots that makes the image not suitable for calibration. It is advised to close the kinect emitter have an IR projector/emitter that does not emit a patter but emit clear IR rays so that the image obtained at the receiver is clean. But since I do not have clean emitter I have used the same images and the results are satisfactory
  • Calibrate both the cameras with the obtained images

External R and T matrices are found the same way but this time keeping the ches board constant for RGB and Depth-IR images.

p_rgb = point in rgb image P_rgb = rgb 3D point
p_ir = point in ir image P_ir = ir 3D point

P_ir.x = (p_rgb.x-Cx)*D/FOCALX 
P_ir.y = (p_rgb.y-Cy)*D/FOCALY
P_ir.z = D

P_rgb = R*P_ir + T
p_rgb.x = P_rgb.x*FOCALX/P_rgb.z
p_rgb.y = P_rgb.y*FOCALY/P_rgb.z

Thus p_rgb is the point corresponding to p_ir point on the IR image. P_rgb.z is the distance of p_rgb from the RGB camera.

The results that I have obtained are

The last image is the image obtained by overlapping the depth(2 nd image) and the RGB(1st image)

Stereo Cameras


After getting some familiarity with camera calibration, intrinsic/internal parameters and other definitions, now we will look into stereo cameras configuration.

Stereo camera is nothing but having 2 or more camera together watching the same scene and utilizing the geometric properties to deduce information about the scene that was not available to us using 1 camera. For example, When we look into the camera calibration carefully we see that, if we know the 3D co-ordinates then we can get the position of that point on the image but the vice versa cannot be done since in the final equation, x=KX : K being the intrinsic matrix,  the image points x=(u,v) are scaled by Z the depth parameter.But, we can get the line(line joining the center of projection and image pixel point) on which the 3D point lies. This is where the Stereo Configuration comes to our rescue. If we know the relation between the co-ordinate systems of two cameras then we can find the point of intersection of two lines that will give us the point on the 3D system that is common to both the cameras. This is called stereo triangulation. This is the basic principle behind the operation of Human Eyes.

But the concept of operation behind stereo cameras such as Microsoft Kinect, Bumble Bee though they are two different depth sensing technologies, is slight different. Consider Bumble Bee, this has 2 RGB cameras, works on the principle of Stereo Correspondence, in other words these cameras get the depth map of the scene using the following relationship. Here we consider the pixel in cameras are (x_l,y)in CLeft and (x_r,y)in CRight and Zr = Zl = Zl, Left camera is considered to be the Origin of this stereo camera system.

  d= disparity = abs(xl-xr), T = Base Line length of the line joining the centers of the cameras.

Hence, it can be seen that, If we have matched points in stereo camera system as shown in the figure, then we can get the Z/depth of that point, this is called stereo correspondence.

There have been large number of implementations of this technique, is here 

There are disadvantages with this kind of a stereo system that works in stereo correspondence and finding a match with RGB cameras always takes time and the match is not exact at certain times and surfaces. Thus a variation of this is the stereo depth calculation using Structured Light.  The various other aspects and experiments in  stereo calibration can be found http://blog.martinperis.com/

Camera Calibration


Hi all,

This post will be covering the aspects of Camera calibration and other subtle aspects people tend to miss during the process. This post, hopefully, will be a one stop for all beginners in camera calibration. Camera calibration us usually required when you need the camera/camera-system to interact with the real world, or in other words you want to take some measurements in the real world co-ordinate system.

To start, Pin-Hole camera model:

  • The origin of the camera co-ordinate System is the Camera itself- Center of Projection.
  • The Principal axis is the Z-co-ordinate axis of this system.
  • The Image is formed at the Focus F of the camera which is at a distance f along the Z axis i.e.  (0, 0, )
  • The Image Plane is the X-Y plane at (0, 0, ),

Image

With the above figure considering the similar triangles that are formed with the point P(X,Y,Z), O and

the Z axis and P_i(u,v,f), O and Z axis, we get

Image

The point (u, v), wrt to the image-co-ordinate system that has its origin at the left top corner can be written as,

Image

where (tu,tv) is the center of the image.Where w = Z and u = w*u and v = w*v. Since the camera is not arranged as pixels but its a sensor we have to convert the focal length to the sensor dimension per pixel. mu and mv are the dimensions of the sensor/pixel in the x and y direction then we have

Image

In Matrix Form

Image

The Matrix K  is called the INTRINSIC matrix and the parameters f, tu, tv are called the camera Intrinsic/Internal parameters since these are Intrinsic to the Camera.

There are enough tools online to get the internal parameters of the camera OpenCV provides a bunch of function to perform camera calibration. Matlab Calibration ToolKit gives an extensive understanding and implementation for calibrating a camera.

There is a certain check list that I made for myself to improve the accuracy of the calculated internal parameters, These are with respect to the Chess Board calibration method

  • Take a lot of images in “Different Orientations”
  • It is better if the Chess board that is used has large number of corners/squares
  • It is better if the Chess board Fills the complete Image Plane.
  • Go through This Blog if you are using the OpenCv camera calibration tool box. and understand the various function before you get confused when you are programming/debugging

Depth Using kincet


Its been almost an year since the release of Microsoft Kincet and there has been a lot of enthusiasm amongst researchers to utilize this technology to the at-most extent. So, Guys This post could be used as a startup tutorial to all the new enthusiasts in town.

Ok, so as you all might be knowing the Kincet has a projector-receiver pair with an RGB camera. I mean Kincet projects a IR pattern and the receiver searches pattern for a co-relation with the pattern this is known to the receiver,   and thus the distance D. Now its important to understand is that the D the is obtained using the equations in this blog post or using the array pDepthMap[x + 640*y] in OpenNI sdk, at (x,y), is the 3rd dimension and not the actual distance from the camera. The actual  distance of the point (x,y) whose 3D point is (X,Y,Z) will be D=sqrt(X*X + Y*Y + Z*Z) i.e. This is the distance of the point (X,Y,Z) from the Origin which is at the camera center.

where

X=(x-center_x)*focal_x/Z;

Y=(y-center_y)*focal_y/Z;

(x_center,y_center) is the center of the image.

That gives you the X,Y,Z co-ordinates of the point (x,y) in 3D space.

I thought I would blog this point as I made a this mistake in one of my projects so thought someone might use this.

Bye

towards 3D satellite view in Maps !


I was having a chat with one of my frnds abt the 3D maps that google maps has for android phones and tablets. He said this is not sufficient but there should be a realtime mapping of area with the GPS location .. now i am definitely not sure if it is possible to have that real time mapping to show the actual movement of cars and pedestrians on the streets, and all of these if integrated on the phone .. then what will happen !!!

I think, this acquisition can get Apple close to 3D imaging.. and abt the real time mapping, hmm lets leave that to the imaging guys to think on all that ,….

Distance Transform and Shape Matching


Hi all, This post will deal with a Distance transform which is a very simple and effective algorithm that could be used for shape matching or template matching.

Before we get into matching, identification of the shape automatically is important to make the system a real time one, which is a challenging task. The first feature that comes to my mind when i think about shape is the boundary or contour of the object, hence if we are able to identify the edges that constitute the object of shape that we are trying to match then we are through to the next stage that is matching.

Distance transform of an edge binary image is an image where the value of each pixel is the distance from its nearest edge, hence the distance closest to the edge will have the value 0 and the value increase as the distance increases.

Matching is finding the position of the template in the distance transform(DF) image that has the minimum distance D, Where D is the sum of the pixel values of the DF image that lie under the template image.

Scaling is taken care by having the template image is various scales and finding the least distance amongst the scales.

Here in this example i have a template of a head and i use this template to find the head like position in the given silhouette. have a look at the images for further insight into the method and the algorithm

 When i run my program i get….

the circles are the positions where the search has been done in the first image and the double circle is the location where the match has been found out…

this is code

For finding the Distance transformed image

IplImage* chamfer = cvCreateImage(cvGetSize(segment1),8,1);
for( i=0;i<segment1->height;i++)
{
for( j=0;j<segment1->width;j++)
{
 d=0;dist=1000000;
 for(k=0;k<ref_points.size();k++)
 {
    d = abs(i-ref_points.at(k).y) + abs(j-ref_points.at(k).x);
	 if(d<dist)
	{
	dist = d;
	}
}
     if(dist <= 255 )
     ((uchar *)(chamfer->imageData + i*chamfer->widthStep))[j]=dist;
     else((uchar *)(chamfer->imageData + i*chamfer->widthStep))[j] =0;
}
}

For matching the template with the image

for( i=0;height;i=i+templateImage->height)
{
  for(j=0;width;j=j+templateImage->width)
  {
   if(i+templateImage->height && j+templateImage->widtH)
   {
    vi_1=0;vi=0;
    for(k=0;k<template_points.size();k++)
    x = template_points.at(k).x + j;
    y = template_points.at(k).y + i;
    vi =((uchar *)(chamfer->imageData + y*chamfer->widthStep))[x];
    vi_1 = vi_1 + vi*vi;
    count++;
     }
     vi_1 = (sqrt(vi_1))/template_points.size();
     if((vi_1) < min_dist)
     {min_dist = (vi_1);ki=i;kj=j;cout<<" min "<<min_dist<<endl;}

the template_points are the points where the template(x,y)=255 in other words the places where the template has an edge. the pixel values at that (x,y) is considered to find the distance

Convert Movie to Image frames


This post is abount how to convert a .avi movie file to a set of numbered frames using OpenCV API. This can as well bw done using ffmpeg. For those who want to see how to use various I/O functions in OpenCV or who are starting to use camera functions , this could give you a start to a set of I/O operations with OpenCV.

#include"cv.h"
#include "highgui.h"
#include<stdio.h>
#include<stdlib.h>

int main(int argc , char** argv)
{
CvCapture* vdo_capture=NULL;
IplImage *input_img,*output_img,*temp;
vdo_capture=cvCaptureFromAVI("hand_1.avi");
int frameH    = cvGetCaptureProperty(vdo_capture, CV_CAP_PROP_FRAME_HEIGHT);
int frameW    = cvGetCaptureProperty(vdo_capture, CV_CAP_PROP_FRAME_WIDTH);
int no_frame  = cvGetCaptureProperty(vdo_capture,CV_CAP_PROP_FRAME_COUNT);
int fps       = cvGetCaptureProperty(vdo_capture, CV_CAP_PROP_FPS);
printf("nframe %d fps  %d  \n" ,no_frame,fps);

char* str1;
str1="mkdir ";
char *t="images";
char* str3 = (char *)calloc(strlen(str1)+strlen(t)+1,sizeof(char));
strcat(str3,str1);
strcat(str3,t);
printf("str3: %s \n ",str3);
system(str3);
int i=1;
while(i<no_frame)
{
input_img=cvQueryFrame(vdo_capture);
temp=cvCreateImage(cvGetSize(input_img),input_img->depth,input_img->nChannels);
output_img=cvCreateImage(cvGetSize(input_img),input_img->depth,input_img->nChannels);
cvCopy(input_img,temp,NULL);
char *name;
i++;
sprintf(name, "%d.jpg",i);
cvSaveImage(name,temp);
}
return 0;
}

There could be a lot of optimization to this code, I am uploading this in a raw format..

InfraRed image from Kinect


Hi all,

This post will deal with the significance of kinect using the IR projevtor and receiver to access the depth imformation that has inspired a lot of programmers worldwide to work and develop some cool stuff with kinect.

The Kinect definitely uses stereo triangulation for deriving the depth info but the stereo triangulation is done not with RGB cameras but with an IR Projector and a IR Receiver, info regading how it is done is all well articulated here in ROS.org. Kinect has been out for a while now and there have been lot of vdos on youtube regarding various attempts made suing kinect. But, there are people who are just starting up. So this post may help them just to push start their play with kinect. Check out THIS post of mine to have various links on installation of the software.

Lets start out by accessing the IR data/image from Kinect..

this is how its done

 

#define CHECK_RC(rc, what)											\
	if (rc != XN_STATUS_OK)											\
	{																\
	printf("%s failed: %s\n", what, xnGetStatusString(rc));		\
	return rc;													\
	}

using namespace xn;
int main()
{
	XnStatus nRetVal = XN_STATUS_OK; 
	Context context; 
	nRetVal = context.Init(); 
	CHECK_RC(nRetVal, "Initialize context"); 
	IRGenerator ir; 
	nRetVal = ir.Create(context); 

	XnMapOutputMode mapMode; 
	mapMode.nXRes = 640; 
	mapMode.nYRes = 480; 
	mapMode.nFPS = 30; 
	nRetVal = ir.SetMapOutputMode(mapMode); 
	nRetVal = context.StartGeneratingAll(); 
	IRMetaData irMD; 
	int key = 0;
	int i=0;
	char a[10];
	while(1)
	{
	context.WaitAnyUpdateAll();//wait and error processing 
	ir.GetMetaData(irMD); 
	const XnIRPixel* pIrRow = irMD.Data(); 
	Mat ir16(480,640,CV_16SC1,(unsigned short*)irMD.WritableData());
	Mat irMat;
	ir16.convertTo(irMat,CV_8U);
	imshow("ShowImage", irMat); 
	sprintf(a,"%d.jpg",i++);
	cout<<i<<endl;
	imwrite(a,irMat);
	waitKey(10);
	}
return 0;
}

I have used OpenNI GUI with Opencv HighGUI functions to display images but you can even use OpenGL also

thats me ILLUMINATED!!!!

Hand Gesture Recognotion


Hi all,

Hand Gesture or Palm Gesture recognition is one of widely researched topics. I have not looked at any highly reliable algorithms but recently i had a chance to look through the Structural Analysis Reference on opencv. And came across the function cvConvexityDefects. This is not one of  very popular functions but can be used in a variety of ways.. believe me this has made my life pretty much easier..

There are a few functions in opencv that we can use to do this Hand detection before we go for gesture recognition. There are a lot of vdos in youtube that do this gesture recognition. Some are  1  2   3 . each one of them is different in there own ways. The method that i have used might be in many ways similar to these. I have used convexhull with  cvConvexitydefect function detect the hand.

Another easy a way that I did not want to do was use the skin detection technique to detect the hand region in the image. How to use skin detection and the other color space related information can be found in my Skin Detection post.

As Skin Detection is not that accurate, and the availability of depth cameras have inspired to use these and take the advantage of depth thresholding to find the hand of the person and try some gesture recognition stuff. Yes, we take it as granted that the preson preforming action has his hands in front of the depth camera. The depth sensor that I use here is the kinect… how to use this, install and other interesting links can be found in my upcoming post on kinect.

This is one my experiments, check it out the points that are obtained from this program can be used to form a hand gesture recognition system…

#include <cv.h>
#include <highgui.h>
#include <cvaux.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace cv;
using namespace std;

int main(int argc, char *argv[])
{
cout<<"no. of input arguments "<<argc - 1 <<endl;
if(argc == 0)
{
cout<<"Image Expected !!"<<endl;
exit(0);
}

IplImage *in = cvLoadImage(argv[1],CV_LOAD_IMAGE_GRAYSCALE);
cvErode(in,in,NULL,2);
cvShowImage("in",in);
cvWaitKey(5);
CvMemStorage* storage = cvCreateMemStorage(0);
CvSeq* contours = 0;
int header_size = sizeof(CvContour);
cvThreshold( in, in, 30, 255, CV_THRESH_BINARY );

int c = cvFindContours(in,
                       storage,
                      &contours,
                      sizeof(CvContour),
                      CV_RETR_TREE,CV_CHAIN_APPROX_NONE,
                      cvPoint(0,0));
cout<<"number of countours "<<c<<endl;
CvPoint* PointArray;
CvSeq *seqhull, *defects;
CvMemStorage *stor03 = cvCreateMemStorage(0);
CvConvexityDefect *defectArray;
/**************************************************************/
double area =0;

while( contours )
{
area = cvContourArea( contours, CV_WHOLE_SEQ);
cout<< "area "<<area<<endl;
CvRect rect;
int count = contours->total;
rect = cvContourBoundingRect( contours, 1);
cvRectangle(in,
            cvPoint(rect.x-10,rect.y-10), 
            cvPoint(rect.x + rect.width+10,rect.y + rect.height+10),
            CV_RGB(255,255,255) ,
            1);
PointArray = (CvPoint*)malloc( count*sizeof(CvPoint) );
cvCvtSeqToArray(contours, PointArray, CV_WHOLE_SEQ);
seqhull = cvConvexHull2(contours, NULL, CV_COUNTER_CLOCKWISE, 0);
cout<<"check "<<endl;
defects = cvConvexityDefects( contours, seqhull, stor03);
int nomdef = defects->total;
defectArray = (CvConvexityDefect*)malloc(sizeof(CvConvexityDefect)*nomdef);
cvCvtSeqToArray(defects,defectArray, CV_WHOLE_SEQ);
for(;defects;defects = defects->h_next)
{
for(int i=0; i<nomdef; i++)
{
cvLine(in, *(defectArray[i].start), 
           *(defectArray[i].depth_point),
            CV_RGB(255,255,255),
            3,
            8,
            0 );
cvCircle( in,*(defectArray[i].depth_point),5,CV_RGB(255,255,255),
        3,8,0);
cvCircle( in,*(defectArray[i].start),5,CV_RGB(255,255,255),3,8,0);
cvLine(in,*(defectArray[i].depth_point),*(defectArray[i].end),
       CV_RGB(255,255,255),3,8,0 );
}
}  

// take the next contour
contours = contours->h_next;
}
cvShowImage("IMAGE",in);
cvWaitKey(0);
/***************************************************************/
return 0;
}

How to read a research Paper


Dr. Mubarak Shah has been into the field of research in cameras and computer vision since long, and this is how according to him a Research Paper has to be read….
Different people have there own way to reading and understanding/interpreting things, though Dr. Shah way of reading research papers is difficult it seems to be effective.

http://server.cs.ucf.edu/~vision/faculty/HowToRead.html

1. You have to read the paper several times to understand it. When you read the paper first time, if you do not understand something do not get stuck, keep reading assuming you will figure out that later. When you read it the second time, you will understand much more, and the third time even more …

2. Try first to get a general idea of the paper: What problem is being solved? What are the main steps? How can I implement the method?, even though I do not understand why each step is performed the way it is performed?

3. Try to relate the method to other methods you know, and conceptually find similarities and differences.

4. In the first reading it may be a good idea to skip the related work, since you do not know all other papers, they will confuse you more.

5. Do not use dictionary to just look up the meaning of technical terms like particle filters, maximum likelihood, they are concepts, dictionaries do not define them. They will tell you literal meanings, which may not be useful.

6. Try to understand each concept in isolation, and then integrate them to understand the whole paper. For instance, the paper on “Feature Integration with adaptive weights in a sequential Monte Carlo Tracker” is quite complex paper at the first look. Because it uses Monte Carlo, particle filter, likelihood etc. But try to understand the gist of it. The paper is about tracking, you know a few tracking methods already. It uses features: color histogram, templates in correlation, shape, etc. You know these features, and you have used them. The probabilities obtained by each features are combined (fused) to achieve tracking. How will you combine the probabilities or confidences of each features: multiply, add, apply threshold and then add …