Professional Documents
Culture Documents
Not quite what you are looking for? You may want to try: Designing And Implementing A Neural Network Library For Handwriting Detec tion, Image Analysis etc.- The BrainNet Library - Full Code, Simplified Theory, Full Illustration, And Examples AI : Neural Network for beginners (Part 1 of 3)
7,938,780 members and growing! (53,311 online) zedmed 353
Home
Articles
Learning Zones
Features
Help!
The Lounge
neuron
See Also
More like this More by this author
This artic le desc ribes the implementation of a neural network with CUDA.
Article Browse Code Stats Revisions
4.97 (48 votes) 51
Sponsored Links
Download demo (release build requiring CUDA and 120 dpi) - 584.61 KB Download GUI source code - 509.68 KB Download kernel (the Neural Network c ore) - 2.78 KB
Introduction
An Artific ial Neural Network is an information proc essing method that was inspired by the way biological nervous systems function, suc h as the brain, to process information. It is c omposed of a large number of highly interconnected proc essing elements (neurons) working in unison to solve specific problems. Neural Networks have been widely used in "analogous" signal classific ations, including handwriting, voice and image rec ognitions. Neural network c an also be used in computer games. It enables games with the ability to adaptively learn from player behaviors. This technique has been used in racing games, suc h that opponent c ars c ontrolled by computers can learn how to drive by human players. Sinc e a Neural Network requires a considerable number of vector and matrix operations to get results, it is very suitable to be implemented in a parallel programming model and run on Graphic s Processing Units (GPUs). Our goal is to utilize and unleash the power of GPUs to boost the performanc e of a Neural Network solving handwriting recognition problems. This projec t was originally our graphics architecture course projec t. We ran on GPU the same Neural Network desc ribed by Mike O'Neill in his brilliant artic le "Neural Network for Recognition of Handwritten Digits".
See Also...
codeproject.com/KB//GPUNN.aspx
1/7
7/11/2011
The neural network we implemented was a 5 layer network called convolutional neural network. This kind of network is proven to be suitable for rec ognizing handwritten digits. For more theoretical details, please check out Mike's article and the references he has listed. The first three layers of our neural network consist of several feature maps. Eac h of them is shrunken from the previous layer. Our input is a 29*29 image of a digit. Therefore, we have 29*29=841 neurons in the first layer. The second layer is a c onvolutional layer with 6 feature maps. Eac h feature map which is a 13*13 image is sampled from the first layer. Each pixel/neuron in a feature map is a 5*5 convolutional kernel of the input layer. So, there are 13*13*6 = 1014 nodes/neurons in this layer, and (5*5+1(bias node))*6 = 156 weights, 1014*(5*5+1) = 26364 c onnec tions linking to the first layer. Layer 3 is also a convolutional layer, but with 50 smaller feature maps. Each feature map is 5*5 in size, and each pixel in these feature maps is a 5*5 c onvolutional kernel of corresponding areas of all 6 feature maps of the previous layer. There are thus 5*5*50 = 1250 neurons in this layer, (5*5+1)*6*50 = 7800 weights, and 1250*26 = 32500 c onnec tions. The fourth layer is a fully-c onnec ted layer with 100 neurons. Since it is fully-c onnec ted, each of the 100 neurons in the layer is connected to all 1250 neurons in the previous layer. There are therefore 100 neurons in it, 100*(1250+1) = 125100 weights and 100x1251 = 125100 c onnec tions. Layer 5 is the final output layer. This layer is also a fully-c onnec ted layer with 10 units. Eac h of the 10 neurons in this layer is connected to all 100 neurons of the previous layer. There are 10 neurons in Layer 5, 10*(100+1) = 1010 weights and 10x101 = 1010 c onnec tions. As you can see, although structurally simple, this Neural Network is a huge data structure.
Our Implementation
Due to all the inconvenience about GLSL mentioned above, we finally c hoose CUDA. The reason that the Neural Network is suitable for GPU is that the training and execution of a Neural Network are two separate proc esses. Onc e properly trained, no writing access is required while using a Neural Network. Therefore there is no synchronization issue that needs to be addressed. Moreover, neurons on a same network level are c ompletely isolated, such that neuron value c omputations can achieve highly parallelization. In our c ode, weights for the first layer are stored as an array, and those inputs are c opied to device. For each network level, there is a CUDA function handling the computation of neuron values of that level, sinc e parallelism can only be achieved within one level and the c onnec tions are different between levels. The c onnec tions of the Neural Network are implicitly defined in CUDA functions with the equations of
codeproject.com/KB//GPUNN.aspx
2/7
7/11/2011
next level neuron computation. No explic it connection data structure exists in our code. This is one main difference between our c ode and the CPU version by Mike.
For example, eac h neuron value of the sec ond level is a weighted sum of 25 neurons of the first level and one bias. The second neuron level is c omposed of 6 feature maps; each has a size of 13*13. We assign a blockID for each feature map and a threadID for each neuron on a feature map. Every feature map is handled by a bloc k and each pixel on it is dealt with by a thread. This is the CUDA func tion that c omputes the second network layer:
Collapse
__global__ void executeFirstLayer (float *Layer1_Neurons_GPU,float *Layer1_Weights_GPU,float *Layer2_Neurons_GPU) { int blockID=blockIdx.x; int pixelX=threadIdx.x; int pixelY=threadIdx.y; int kernelTemplate[25] = { 0, 1, 2, 3, 4, 29, 30, 31, 32, 33, 58, 59, 60, 61, 62, 87, 88, 89, 90, 91, 116,117,118,119,120 }; int weightBegin=blockID*26; int windowX=pixelX*2; int windowY=pixelY*2; float result=0; result+=Layer1_Weights_GPU[weightBegin]; ++weightBegin; for(int i=0;i<25;++i) { result+=Layer1_Neurons_GPU [windowY*29+windowX+kernelTemplate[i]]*Layer1_Weights_GPU[weightBegin+i]; } result=(1.7159*tanhf(0.66666667*result)); Layer2_Neurons_GPU[13*13*blockID+pixelY*13+pixelX]=result; }
All other levels are computed the same way; the only difference is the equation of calculating neurons.
codeproject.com/KB//GPUNN.aspx
3/7
7/11/2011
The main program first transfers all the input data to GPU and then calls eac h CUDA func tion in order and finally gets the answer.
The user interface is a separate program using C#. Users can draw a digit with the mouse on the input pad, the program then generates a 29*29 image and calls the kernel Neural Network program. The kernel, as described above, will read the input image and feed it into our Neural Network. Results are also returned with files and then read back by the user interfac e. Here is a sc reenshot. After drawing a digit, we c an get all the 10 neuron values of the last network layer. The index of the maximum neuron value is the most possible digit. We shade c andidates with different depth of red c olors according to their possibilities. On the right, the user interface will print out feature maps of the first three layers. Note that C# under Windows XP has a resolution issue. We tested our program under 120dpi. A 96dpi resolution setting could shift the input image around, so that the ac curacy is badly affec ted. No training part is included in our GPU implementation. We use Mikes code to train all the weights and cached them with files.
Result
Accuracy Our Neural Network c an achieve a 95% ac curac y. The database we used to train the network is called MNIST containing 60000 handwriting examples from different people. It is reported by Dr. LeCun that this network c an converge after around 25 times of training. This number is c onfirmed by our test. We ac hieved only around 1400 miss-recognition samples out of 60000 inputs. Also note that there is a bug in Mike's code. This is the correc ted c ode for initializing the sec ond layer:
Collapse
codeproject.com/KB//GPUNN.aspx
4/7
7/11/2011
for ( fm=0; fm<50; ++fm) { for ( ii=0; ii<5; ++ii ) { for ( jj=0; jj<5; ++jj ) { // iNumWeight = fm * 26; // 26 is the number of weights per feature map iNumWeight = fm * 156; // 156 is the number of weights per feature map NNNeuron& n = *( pLayer->m_Neurons[ jj + ii*5 + fm*25 ] ); n.AddConnection( ULONG_MAX, iNumWeight++ ); for ( kk=0; kk<25; { // note: max val n.AddConnection( n.AddConnection( n.AddConnection( n.AddConnection( n.AddConnection( n.AddConnection( } } } } ++kk )
// bias weight
of index == 1013, corresponding to 1014 neurons in prev layer 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ ); 169 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ ); 338 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ ); 507 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ ); 676 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ ); 845 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
Please refer to this for the details about this bug. Our GPU implementation is based on the c orrect version, however there isn't too much difference in terms of accurac y. Performance
The major reason for using GPU to compute Neural Network is to achieve robustness. The outcome is promising compared to CPU implementation. As shown in the table above, the executing time of GPU version, EmuRelease version and CPU version running on one single input sample is c ompared. The GPU version speeds up by 270 times c ompared to CPU version and 516.6 times compared to EmuRelease version. To be more ac curate, we also c onsidered the IO time consumption of the GPU version. As we can see, even when the IO time is considered, our method is 10 times faster. And in practical use, weight values need only be loaded into the device onc e.
History
14th Marc h, 2008: Initial post
License
This article, along with any associated sourc e c ode and files, is lic ensed under The Creative Commons Attribution-ShareAlike 2.5 License
codeproject.com/KB//GPUNN.aspx
5/7
7/11/2011
kavinguy Member
Article Top
Poor
Excellent Vote
Msgs 1 to 25 of 51 (Total in Forum: 51) (Refresh) Kanasz Robert edem46 4288 Divydeep Agarwal 4288 cardano7 Kanasz Robert fizdumn ilios86 KarlDuke Zodical Julekmen sergneu hamzadj23 cael47 SharkTime bogycat xam_jjf@yahoo.com.cn xam_jjf@yahoo.com.cn xam_jjf@yahoo.com.cn xam_jjf@yahoo.com.cn
codeproject.com/KB//GPUNN.aspx
6/7
7/11/2011
Re: Testing NN kernel code Re: Testing NN kernel code Re: Testing NN kernel code Last Visit: 9:49 11 Jul '11 General News
Use C trl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, C trl+PgUp/PgDown to switch pages.
P ermalink | A dvertis e | P rivac y | M obile | Web2 1 | 2 .3 .1 1 0 7 0 9 .1 L ayout: fixed | fluid A rtic le C opyright 2 0 0 8 by billc onan, kavinguy E verything els e C opyright C odeP rojec t, 19 9 9 - 20 1 1 T erms of U s e
codeproject.com/KB//GPUNN.aspx
7/7