You are on page 1of 9

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

ORG

203

FRUIT-DETECTOOLS: Leafs Identification Using Image Processing Approach for Recognize the Fruit Trees
M. Mustafa, M.R.M. Shafry, A.J. Masita, S. Ghazali, M.Z. N. Azlin, W.A.B.W. Aezwani and M. Masri.
Abstract Image processing has been widely applied in many fields of study such as medicine, construction and safety. Applying these techniques for plant recognition has also been practiced by many previous researchers. Plants could be identified traditionally by the structure of the leaves. However, applying image processing technique for these identifications has recently become a research issue. Focus of this research paper is to identify local fruit trees by using this image processing method. The methodology involves three stages of images processing which are pre-processing, feature extraction and leaves recognition. The image pre-processing begins with converting the RGB image to the grey-scale image by applying thresholding technique before removing the noise. Sobel operator is applied to the binary image to recognize the edge of that image before thinning the edges. The feature extraction process is then conducted by using the chain code technique. The features that had been extracted were the length, width, perimeter and the shapes of the leaves. The last stage is to recognize the leaves feature by using Linear Comparison technique. An experiment is carried out to 150 self-generated leaf images as the training and test data. The results have shown that the accuracy rate of 80 percent is recorded. Index Terms Image Processing, Leaves Structutes, Sobel Operator, Linear Comparisons and Chain Code Technique.

1 INTRODUCTION
OWDAYS, technology is rapidly expanding with the advent of tools that is characterized by modern technology. In accordance with the phrase "information at their fingertips", a variety of information systems has been developed for the convenience of users. These information systems stored various information required by users. Information systems today are not only textual but also include various forms of multimedia such as pictures, animations and videos. Trees or natural vegetation can be identified through various methods such as stem cuttings, root structure, form or shape of the tree itself and also through the leaf structure. Each type of trees has different or unique characteristics. These features are used to obtain information about the tree at the place of cultivation, the temperature or the appropriate conditions to ensure that the trees can grow and proliferate well. This paper involves the study of the structure of leaves. Leaves come in many different shapes such as in the form of broad leaves, needles, long and maple. The

types of leaves are different from one tree to another tree. However, there are some plants with similar leaf shape but can be distinguished through the veins because the veins are different from each other. Such special features can provide information to us on a tree. This paper is divided into five parts which include related works, research methodology, testing and result, discussion and conclusion.

2 RELATED WORKS
Research in the field of image processing is very broad and covers various types of data such as text, static images, moving images, biometrics, handwriting and many others. Figure 1 shows the category of leaf characteristics that have been used by previous researchers. However for this research, we only deal with static digital images of tropical fruit trees. These leaves are different in their shapes, sizes, perimeters, edge lines, as well as the width and leaf veins [1].

M. Mustafa and A.J. Masita. Department of Computer Science, Faculty of Science and Technology, Universiti Malaysia Terengganu (UMT), Kuala Terengganu, Malaysia. M.R.M. Shafry, S. Ghazali and M.Z.N. Azlin. Department of Computer Graphic, Faculty of Computer Science and Information System, University Teknology Malaysia (UTM), Skudai, Johoar, Malaysia. W.A.B.W. Aezwani. Department of marine Engineering, Maritime Academy of Malaysia (ALAM), Batu rakit Campus, Terengganu, Malaysia. M.Masri. Department of Agriculture, Faculty of AgroTechnology, Universiti Malaysia Terengganu (UMT), Terengganu, Malaysia.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

204

have set a fixed number for identification purposes is about 150 [4]. If the number of chain codes is less than that amount, the process undertaken is normalized to obtain up to 150 numbers and if the number of the chain-code is greater than 150, the process is carried out to normalize the number of unnecessary waste. This process is done so that the identification number in the pattern graph comparison can be done with care.

Fig. 1. Leaf Characteristics Categories 2.1 Computer-Aided Plant Species Identification Technique (CAPSI) Computer-Aided Plant Species Identification (CAPSI) which was proposed by is based on the image matching technique of leaf shapes as illustrated in Figure 2.

Fig. 3. Flow chart for the chain code The edge points can be connected using the chain code and chain code can also represent digital curves. Chain code has information about the curvature form it has been widely used in image processing. Because the line is one of the most important elements in the image, it has been used to extract information from the chain code lines [5]. 2.3 Artificial Neural Network (ANN) Artificial Neural Network is used for the classification process. There are many studies that have been conducted previously using ANN classifier [6]. They proposed a combination of this technique and the ambangan classifier [6]. Although ambangan is based on image histogram that could help to extract a few pixels veins, it does not show good results for different illumination conditions. Thus, the combination of ambangan methods and Artificial Neural Network (ANN) is used. This combination of features could be used to extract veins more accurately, but computational complexity is still regulated. Implementation of the proposed method is shown in Figure 4.

Fig. 2. Computer-aided Plant Species Identification Process Contour extracted leaf contains many points that are not perfect. Therefore, matching techniques cannot be applied directly [2]. Douglas-Peucker algorithm is used to obtain a more contoured fit with a smaller number of points. After carrying out the estimated angular leaf shape, contour can be represented as a point of order [3]. Number of nodes generated is smaller than the original contour, but it does not change the rotation, scale and translation of the original. 2.2 Chain Code Technique There are many previous studies that used chain code technique. Figure 3 shows the flow chart of the overall processes that have been used for the identification process using the chain code numbers. In this study, researchers do normalize the process for obtaining the same number of chain code. Researchers

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

205

Preparation of Data Sets

Convert RGB images to Grayscale

Noise Elimination

Convert Image Grayscale to Binary

Fig.4. Ambangan and ANN techniques used by researchers [7] Digital camera is used to capture images of the leaves. Lighting is also used and placed at the rear of the leaf veins to make them clear. The image is stored in JPEG format and is converted to gray scale image (grayscale) and then the leaves are classified using a simple ambangan process. With white background, the process worked well [7].

Edge Detection

Thinning

Feature Extraction using the Chain Code

3 RESEARCH METHODOLOGY
Methodology plays an important role in a research process. Various techniques have been used by previous researchers in the plants identification process based on leaf structures. In this study, the chain code method which was developed by [4, 5] is used. The chain code method is used to extract features that are found in leaf samples and the Linear Comparison method is used for leaf identification process. Leaf image will undergo image pre-processing phase before features found in the leaves are extracted and then identified. Figure 5 shows an overview of the overall methodology used in this study. This method consists of eight stages namely the preparation of the data set, conversion of RGB images to gray scale image, noise removal, and conversion of gray scale image to binary image, edge detection, thinning the line using the thinning process, feature extraction and finally recognition of the leaves.

Recognition

Fig. 5. Flow chart of the identification method leaves 3.1 Preparation of Data Sets In this study, images of local fruit leaves were used as the data set. Ten leaves were collected from five species of trees within the scope set. Among the tree leaves are cempedak leaves, ciku, durian, rambutan, and pulasan. Harvested leaves were kept overnight to obtain a flat leaf structure. This is so to facilitate the intake of leaf images using a digital camera. Leaves were then placed over a white cloth measuring 28 cm x 18 cm as the background. The images were taken using the digital camera in the well-lit room. The distance between the camera and leaf samples was set at 23 cm. Each leaf was photographed three times to obtain more accurate information. A total of 150 leaf images were taken and used in this study. All leaves images were stored in JPEG format and will be used as the data set.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

206

3.2 Conversion of RGB images to gray scale images In this process, RGB images are converted to gray scale image (grayscale). The formula used to convert the RGB pixel values to gray scale pixel values are as follows: gray = 0.2989 * R + 0.5870 * G + 0 .1140 * B Where R is red, G is Green and B is blue represent each color pixel. 3.3 Elimination of Noise (Noises Removal) This process is a process where the noise is removed. RGB images which were converted to gray scale image were used in this process. For noise reduction, a median filter with size 3 x 3 is used as sampler. In this process, the gray level at each pixel is replaced with the median gray level at the neighborhood pixel. To use a median filter on the neighboring pixel, the value of these neighboring pixels and pixels arranged in ascending order. The median of these values is obtained and set as the value in pixels. Figure 6 shows an example of the size of the 3x3 median filter. The gray level values in the neighborhood are (64, 64, 64, 64, 255, 255, 64, 64, 255). These values are then arranged in ascending order (64, 64, 64, 64, 64, 64, 255, 255, 255). The value 64 is arranged in the median of the gray level as highlighted in Figure 6.

black, depending on the label pixels. In this study, the ambangan value is in the range of 160 to 190. It is therefore to ensure that all the blackened leaves that are different from the background. If the value ambangan used is less than 160, most of the leaf image cannot be in full shad. This will cause problems during the process of edge detection because the actual form of the leaves cannot be obtained accurately. 3.5 Edge Detection Images are now in the binary format and the noise has been eliminated from the image leaves. The next process is the process of edge detection. This process is important because it aims to clarify the form of leaves with a line around the leaves. The binary image of the perfect leaf can be obtained as a result of ambangan value used previously. To get the right tree, the most important thing is the margin of the leaf. Therefore, the edge detection process is done using sobel operators on a binary image of the leaf. The sobel operator used to calculate the slope at the point labeled Z5 is shown in Figure 7.

Fig. 7. Sobel Operator Fig. 6. The median filter with size 3 x 3 3.4 Conversion Grayscale Images to Binary Ambangan technique was used to convert the gray scale image to binary image. During the ambangan process, the individual pixels in an image are marked as an object if their value is greater than some value ambangan (assuming the object to be brighter than the background) and as a background pixel otherwise. This convention is known as ambangan. Variants include ambangan under the opposite of the above, in which a given pixel is labeled as object if the value is between two ambangan and outside ambangan in the opposite of ambangan [8]. Typically, an object pixel is given value "1" while the background pixels are given the value "0". Finally, a binary image produced by each pixel of color white or g |(z7 + 2z8 + z9 ) - (z1 + 2z2 + z3 )| + |(z3 + 2z6 + z9) - (z1 + 2z4 + z7 )| To make a black border on a white background, the pixel value "0" and "1" is reversed. 3.6 Process of Thinning Lines Line thinning process is a morphological operation used to remove foreground pixels in a binary image. After the edge detection process is carried out, the resulting lines are thick which contains several pixels. Therefore, the line thinning process is carried out to get only one line of pixels. This is required to facilitate the feature extraction process to be carried out later. If the line has a number of pixels, the feature extraction process will experience problems.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

207

Like other morphological operations, the nature of the thinning line is determined by the structural elements. The line thinning process is related to the transformation of the hit-and-miss and can be determined as follows: Thin ( I , J ) = I hit-and-miss( I , J ) Line thinning process is calculated by translating the source of the structural elements of each possible pixel position in the image and at each position compared to the pixel image at the bottom. If the foreground and background pixel in the structural elements in accordance with the cascade in face and background image pixels, the pixels will be set into the bottom of the background of "0". Through this process, the line side of the image will be one pixel only [9]. 3.7 Extraction of feature In this study, the features extracted from the leaves are the shape, perimeter, length and physiological leaf width. Perimeter and leaf shapes are extracted using the chain code technique. 3.7.1 Basic features of Leaves i. Leaf shape Leaf shape is an important feature in determining the types of leaves. Many leaves have different form from each other. Unique characteristics of leaf shapes could be extracted using the chain code technique. Ordering the code number of the resulting chain can determine the type of leaf. This is so because the normal form of the leaves of each plant is different. The pattern for the order number of the resulting chain code can differentiate between one types of leaf with another leaf. However, there are different kinds of trees that have the same leave shapes. ii. Perimeter of the leaf Perimeter of the leaves can also be defined as the perimeter of the leaf. Perimeter of the leaf is calculated by counting the number of pixels containing the leaf margin. Perimeter of the leaves can also be obtained by using the chain code technique. The total number of chain code generated is equal to the perimeter. The perimeter value varies with the size of the leaf. Leaves on the same tree do not necessarily have the same perimeter since each tree would have small and large leaves. iii. Term physiological (physiological length) The distance between two points on the main vein is a physiological length of each leaf. For an image of the transverse leaf, the first pixel found from the right side of the leaf, X2 will be rejected by the first pixel found from

the left side of the leaf, X1 to get the physiology length of each leaf. Calculation to find the leaf length is as follows: Length = X2 - X1 iv. Physiology width (width Physiological) By drawing a line through the two terminal major veins, infinite orthogonal lines on the line can be plotted. Total intersection between the line and the margin is not finite [10]. The longest distance between two points is defined as a physiological wide. The first pixel found from the bottom leaves, Y2 will be rejected by the first pixel found from the top leaves, Y1 to obtain the physiology width of each leaf. Calculation to find the width of the leaves is as follows: Width = Y2 - Y1 The relationship between physiology width and physiology length is illustrated in Figure 8.

Fig. 8. The relationship between physiology width and physiology length [Wu et al, 2007] 3.7.2 Chain Code Method Chain code representation is one of the form in which it is used to represent the boundary lines with a straight line connecting the network at a specific length and direction. This representation is based on the 8-chain segments. Direction for each segment is determined by using numbers. Here, the direction that is determined for each segment is shown in Figure 9.

Fig. 9. The eight neighborhoods

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

208

Chain code is a series of numbers that represent the basic shape of the character based on the definition of the above diagram. The boundaries are represented by chain code starting at any pixel boundary line and will move from pixel to pixel starting the next according to an anticlockwise direction until they were reunited with initial pixel. Chain code can be generated by finding the pixels start at the image boundary. The next step is to find pixels that are connected on the boundary. Identified boundary lines are connected to one pixel in eight locations around the beginning of pixels as shown in Figure 10. By looking at each pixel in eight neighborhoods according to counter-clockwise direction, at least one pixel that is connected to the boundary lines would be found. If more than one white pixel is found in the 8-chain segments, the chain code cannot be used because this would create confusion. Thus the line thinning process is very important in order to continue with the chain code method.

Binary Image

Find Start Point

3x3 Neighborhood Pixels NO Update Chain Code List

Start Point YES Display Chain Code List

Fig. 11. Flow chart of the chain code method 3.8 Recognition Recognition process was carried out by comparing the length, width and perimeter of leaves that have been tested with the reference. The value of the leaf length will be matched with the long leaves that have been stored as training data. If the length of the leaves is the same for other types of leaves, leaf width and perimeter comparison were conducted to obtain accurate results. In this study, a comparison of length, width and perimeter of the leaves is not sufficient to identify the types of leaves. Although the leaves come from different trees some of them have the same characteristics. Thus, the shape characteristic represented by the chain code numbers was used for identification purposes. Recognition by the chain code number requires other identification methods. In this study, the linear comparison method was used. 3.9 Linear Method Comparison Linear method comparison is used for leaf shapes recognition. For this process, two different sets of data sets were used which were the training data and test data sets. For the training data set, a total of seven leaves of each type of tree were used, while for the test data set, three leaves for each type of trees were used. Comparison is the concept of Linear Dynamic Time Warping concept (DTW), which compares the distance between two time series. However, in this study, comparison of linear equations have been used and the percentage of corresponding values between two time series were compared with the Dynamic Time Warping using time curve [10,11]. The original algorithm for Dynamics Time Warping (DTW) is as follows.

Fig. 10. Direction and number of predetermined chain code The ambangan value chain also influences the effectiveness of the code. If there are lines that are not connected, the process cannot continue because ambangan cannot find a white pixel. Figure 11 shows a flow chart of chain code technique used in this study.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

209

Based on this algorithm, f(i, j-1) represents the insertion, the equation f(i, j, j-1) represents the removal (deletion) the equation f(j-1,j) represents the corresponding value (match). The corresponding value used to identify the types of features was based on the linear leaves that had been derived before. Two strings that represent data in the training and test data were compare with each other. For the two strings, the corresponding value was obtained and calculated as a percentage. Figure 12(a) and 12(b) is a sample of data used in this study.

Fig. 13. The source code for calculateLC Through this technique, the number of chain code for the test data were compared one by one with the training data to find a high percentage equation. Distance in the source code above refers to the percentage of match.

4 RESULT AND DISCUSSION


Characteristics of a successful leaf that were extracting in this study are physiological length and width, physiological parameters and leaf form. Table 1 show the successful features extracted in this research. These features are used as training data for the purpose of recognition. Table 1. Characteristics of leaves
Leaf / Features Ciku Cempedak Durian Pulasan Rambutan Long 264-349 512-628 562-694 358-430 353-496 Width 118-166 261-302 157-190 162-208 178-246 Perimeter 542-696 1091-1283 1135-1409 726-893 733-1137

Fig 12(a). Training Data

Fig. 12(b). Test Data String in Figure 12(a) was compared to the string in Figure 12(b). Then, similarities between the two strings were identified one by one. Equation found will be calculated as a percentage. Figure 13 shows the source code to find the similarity percentage between the two strings.

Based on the successful features extracted in Table 1, ciku leaves can be identified by length, width and perimeter of the leaf because these characteristics are very different from the other leaves. Cempedak and durian leaves have long range and almost the same perimeter, but can be distinguished by the wide spread of different leaves and rambutan leaf has the same range for all three features. Therefore, the rambutan leaves and pulasan are identified using Linear Comparison by looking at the percent equation using chain code series of numbers. 4.1 Recognition Using Linear Comparison In this research, 80 percent similarity between the two series of chain code number was used. This is because if the value used were greater than 80 percent, only a few

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

210

leaves can be identified and if the value used were less than 80 percent, leaves cannot be identified with precision. The value of 80 percent is most suitable for the purpose of this study. However there are still some constraints such as error detection leaves. The results showed that 80 percent detection was achieved. For ciku, cempedak and durian leaves, there were no identification problems since the features found in the leaves are unique and different from one another. However, for rambutan leaves, the spread length, width and perimeter that were extracted have similar characteristics. Therefore, the Linear Comparison is used for identification and the results are tabulated in Table 2. For rambutan leaves, only 66.67 percent of the leaves were successfully identified while for the leaves spread only 33.33 percent were identified. It could be seen from Table 4.8 that one of the rambutan leaves cannot be identified. This is because the characteristics of the leaves are very different from the rambutan leaves and weasels which were used as training data. For the leaf spread, two

leaves were identified as rambutan leaf since they were found to have similar characteristics.

5 CONCLUSION
The recognition of local fruit trees through leaf structures using image processing techniques have been carried out. Chain code method is a method that was used to obtain the shape of an object. In this study, the chain code method has been used to extract special features of the leaves such as length, width, shape and perimeter. In addition, a linear feature recognition technique for comparison was successful implemented to achieve the objectives of the research.

Table 2: Results of the leaf recognition

Leaf Name

No of leaves Ciku 1 Ciku 2 Ciku 3 Cempedak 1 Cempedak 2 Cempedak 3 Durian 1 Durian 2 Durian 3 Rambutan 1 Rambutan 2 Rambutan 3 Pulasan 1 Pulasan 2 Pulasan 3

Long 303 318 309 584 510 561 593 602 582 412 404 496 428 372 407

Width 136 129 122 290 289 278 169 164 176 182 182 224 208 193 207

Perimeter 611 647 630 1201 1112 1145 1196 1216 1176 851 857 1027 888 808 861

Result Success Success Success Success Success Success Success Success Success 80.5% Success 82.86% Success Failed 82.15% Success 83.8% Success 81.04% Success

Ciku

Cempedak

Durian

Rambutan

Pulasan

ACKNOWLEDGMENT
The authors would like to thank the Faculty of Science and Technology (FST) Universiti Malaysia Terengganu (UMT), Faculty of Computer Science and Information System (FSKSM) and the Research Management Center (RMC), Universiti Teknologi Malaysia (UTM) for the support and facilities provided.

REFERENCES
[1] [2] Ji-Xiang Du, Xiao-Feng Wang and Guo-Jun Zhang (2006), Leaf shape based plant species recognition, Vol 185, Issue 2: 883-893. Ji-Xiang Du, De-Shuang Huang, Xiao-Feng Wang and Xiao Gu (2006), Computer-aided plant species identification (CAPSI) based on leaf shape matching technique, Transactions of the Institute of Measurement and Control, 28(3): 275 284. David Douglas andThomas Peucker (1993). Algorithms for the reduction of the number of points required representing a digi-

[3]

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

211

[4] [5]

[6]

[7]

tized line or its caricature", The Canadian Cartographer 10(2), 112122.(DOI: 10.3138/FM57-6770-U75U-7727). Zulkefly Bin Mohd Rizal (2010), Number Recognition System Using Chain Code Technique. Nor Amizan Jusoh, Jasni Mohamed Zain (2009), Application of Freeman Chain Codes: An Alternative Recognition technique for Malaysian Car Plates, IJSNS International Journal of Computer Science and Network Security, Vol. 9.No11 Stephen Gang Wu, Forrest Sheng Bao, Eric You Xu, Yu-Xuan Wang, Yi-Fan Chan and Qiao-Liang Xiang (2007), A recognition algorithm for plant leaf classification using Neural Network weblogs,: 1-6. Fu.H and Chi.Z (2006), Combined thresholding and neural network approach for vein pattern extraction from leaf images, IEEE Proc-Vis. Image Signal Process, Vol.153(6): 881-892.

Guang-Quan Lu, Hong-Guo Xu, Yi-bing Li (2005), Line Detection Based on Chain Code, 0-7803-9435-6/05. [9] Qingfeng Wu, Changle Zhou and (2006) Chaonan Wang (2006), Feature extraction and xml representation of plant leaf for image Retrieval, APWeb Workshops 2006, LNCS 3842: 127-131. [10] Toni M. Rath, R. Manmatha, Word Image Warping Using Dynamic Time, Multi-Media Indexing and Retrieval Group, Center for Intelligent Information Retrieval University of Massachusetts, IIS-9909073. [11] Marion E. Munich, Pietro Perona (1999), Continuous Dynamic Time Warping for translation-invariant curve alignment with application to signature verification, Proc. of the 7th International Conference on Computer Vision (ICCV'99).

[8]

You might also like