You are on page 1of 1

Utilizing the GPU and Microsoft Kinect for Enhanced User Interfaces

William Boone Introduction For my final project I propose to create a system involving the use of the Microsoft Kinect interfaced with a standard desktop computer as a gestural input device. Specifically I wish to use either CUDA or openGL compute shaders (preferably the latter) to perform head detection and/or gaze estimation in real time. The output of these computations would behave as either a primary or secondary input to some 3D virtual game world. Ideally the head tracking would be used as a means to pivot or move in some fashion the cameras location while gaze estimation could be used for selecting objects on screen. In addition I would like to explore the possibility of tracking multiple users for use in co-operative or competitive operations. Previous Experience I have previously done head detection using the Kinect on the CPU in real time, however this consumed a very large amount of processing power using my method. It would impose too much overhead to use my CPU implementation in a game interface. In addition, the method I have developed is fairly simplistic. The algorithm is as follows: 1. 2. 3. 4. 5. Capture depth input with user data (the Kinect supports automatic segmentation and labeling of up to 7 people simultaneously) Choose a user to track (currently a user defined parameter) Transform the depth map into a point cloud in world space and flatten into the xy plane Perform a specialized Hough transform to detect a semicircle oriented and scaled to be a rough approximation of the human head. Use semi-automatically collected configuration data about the orientation of the Kinect Sensor and the display screen to determine the location of the center point of the semicircle in user space.

This algorithm, while reasonably robust, could be improved with the use of face detection to determine the orientation of the users head and reduce false positives. Additions and Applications Adding a gaze detection step onto my algorithm would allow the program to estimate where on the screen a user is looking, in addition to their location. Since the Kinect has a finite resolution this step would be primarily based on the users head pose in addition to their location relative to the screen and Kinect sensor. The ability to know the location of the users focus would be very helpful in applications that require the selection of objects by replacing or augmenting the mouse. This could also be adapted for use on people with low mobility. The ability to simultaneously track multiple users could be very useful when paired with an accurate gaze detector to allow multiple people to simultaneously interact with multiple objects on screen at any one time. Goals Parallelize existing implementation of Kinect based head tracker o Port code from C# to C++ o Use either CUDA or openGL compute shaders for parallelism Connect head tracker to an interesting 3D world o Use Kinect to control the cameras fine position and view o Use keyboard, mouse, and/or gamepad to control the cameras coarse view and position Improve head tracking reliability and robustness through incorporation of additional methods Add in gaze estimation Show this in some way in the digital environment Add multiple user detection to the algorithm Design a showcase game or application for demoing purposes

You might also like