Thursday, March 22, 2012

Kinect making 3D video

I discovered a post that makes a 3D video artificially using the Kinect sensor to record the video. The proposed algorithm is a prepossessing stage. Using raw depth data from the Kinect to have the depth element to the video is the depth map is relative and full of holes. The depth data is recorded based upon reflected infrared light coming originally from the sensor. To help compensate, the article proposed using the RGB frames to help clear the depth data up. The proposed algorithm has five steps in order to created an accurate depth map for the 3d video. The first step creates a series of motion estimations using both the frames before the current frame and estimates the motion vectors of the frames after the current frame. The second step is to create a confidence metric for the motion vectors of the future frames in order to assess the quality of the motion vectors. The third step is to use the motion vectors on future frames for "motion compensation" in order to have a better accuracy of the depth of the frames. The forth step is to perform basic depth map filtering. The final step is to fill any holes with the data of neighboring pixels.

The results of this algorithm is a video conversion at 1.4 frames per second. Keep in mind this is not the viewing rate but the processing rate. The algorithm fixes problems with the original depth map. It also make the depth map smoother and more stable.

- Kao Pyro of the Azure Flame

Source:
Matyunin, S., Vatolin, D., & Berdnikov, Y. (2011). TEMPORAL FILTERING FOR DEPTH MAPS GENERATED BY KINECT DEPTH CAMERA. 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON) , 1-4.
http://ieeexplore.ieee.org/search/srchabstract.jsp?tp=&arnumber=5877202&openedRefinements%3D*%26filter%3DAND%28NOT%284283010803%29%29%26searchField%3DSearch+All%26queryText%3DKinect

Thursday, March 8, 2012

Finding the Face

Today, the paper I read is less about using the Kinect, and more about processing an image. A big part of my project is about creating a partial skeleton for the Kinect. In order to do that I need good anchor points to place the joints at. In this case, I've decided the head would be real reliable. The paper is about an efficient algorithm to detect where the face is. The paper mentions several ways to go about detecting a face. The three ways mentions were knowledge based, image based, and feature based. The paper proposes take the feature based approach. Feature based is about finding specific features in the image common to faces, such as skin color, face shape, the eyes, and the nose. The paper goes over two of theses features.

The first feature it approaches is finding the skin color. However a major problem with skin color is differing tones. To solve this issue, they decide to take the image and convert it to a different color scheme. They use the YCbCr color scheme, because it makes a large distinction between skin and non-skin. It also applies the same to many different skin tones and colors, making the algorithm accurate for a variety of people. After the conversion, they draw a bounding box around the "skin" pixels, which in essence is the face.

The second feature they cover are the eyes. They take the bounding box they found with the previous feature as a starting point. Then, assuming the eyes would be in the upper half of the box, they cut out the bottom half to reduce the search area. Then they use a technique called Hough transform, which identifies specified geometric shapes easily, in this case, the eyes as an oval. Hough transform takes many calculations, and can be a problem in programs that require more immediate results.

- Kao Pyro of the Azure Flame

Source:
Choudhar, M. V., Devi, M. S., & Bajaj, P. (2011). Face and facial feature detection. Proceedings of the International Conference & Workshop on Emerging Trends in Technology , 686-689.
http://dl.acm.org/citation.cfm?id=1980022.1980169&coll=DL&dl=ACM&CFID=69641785&CFTOKEN=95949759