Professional Documents
Culture Documents
INTRODUCTION
1.1 Synopsis This project entitled, Study on Value-Added Service in Mobile Telecom Based on Association Rules is extracted from proceedings of the 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing. With the continuous development of information technology, it is beginning to be applied in managing data storage business in more and more fields. However, in face of an increasing number of data, there is a good database management system still needs further exploration. Data mining not only be able to make efficient data storage, and can extract hide useful The current Internet technology and its growing demand necessitates the development of more advanced data mining technologies to interpret the information and knowledge from the data distributed all over the world. In the 21st century this demand continues to grow. Data mining can discover interesting patterns or relationships describing the data and predictive or classify the behavior of the model based on available data. In other words, it is an interdisciplinary field with a general goal of predicting outcomes and uncovering relationships in data. It uses automated tools that employ sophisticated algorithms to discover mainly hidden patterns, associations, anomalies, and/or structure from large amounts of data stored in data warehouses or other information repositories and filter necessary information from this big dataset. Association rule mining refers to discovering association relationships among different attributes. Data mining in the telecommunications sales can. Analyze the optimal and rational sales to match. The association rule mining commodities can be found in the relationship between commodities, such as commodities which are often together at the same time to buy.
Telecommunications industry is a typical data intensive industry, with the deepening of telecom reform, competition is also becoming fierce increasingly. Compared with other industries, the telecommunications industry have more user's data., which can help people analyze the data accurately and obtain useful knowledge, in order to win the competition , people should find more business opportunities and provide users with better service. As a result, data warehouse and data mining has important value in the telecommunications industry. 1
Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses, or other information repositories. It a young interdisciplinary field, drawing from areas such as database systems, data warehousing, statistics, machine learning, data visualization, information retrieval, and high performance computing. Other contributing areas include neural networks, pattern recognition, spatial data analysis, image databases, signal processing and many application fields, such as business, economics and bioinformatics. It includes data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge presentation. Since different users can be interested in different kinds of knowledge, data mining should cover a wide spectrum of data analysis and knowledge discovery tasks, including data characterization, discrimination, association, classification, clustering trend and deviation analysis and similarity analysis. These tasks may use the same database in different ways and require the development of numerous data mining techniques. It includes the discovery of concept or class descriptions, association, classification, prediction, clustering, trend analysis, deviation analysis, and similarity analysis. Characterization and discrimination are forms of data summarization. It can be classified according to the kinds of databases mined, the kinds of knowledge mined, the techniques used, or the applications adapted. Data mining can be classified into descriptive data mining and predictive data mining. Concept description is the most basic form of descriptive data mining. It describes a given set of taskrelevant data in a concise and summarative manner, presenting general properties of the data. Efficient and effective data mining in large databases poses numerous requirements and great challenges to researchers and developers. The issues involved include data mining methodology, user interaction, performance and scalability, and the processing of a large variety of data types. Other issues include the exploration of data mining applications and their social impacts.
Finding frequent itemsets in transaction databases has been demonstrated to be useful in several business applications. Many algorithms have been proposed to find frequent itemsets from a very large database. However, there is no published implementation that outperforms every other implementation on every database with every support threshold. In general, many implementations are based on the two main algorithms: Apriori and frequent pattern growth (FP-growth). The Apriori algorithm discovers the frequent itemsets from a very large database through a series of iterations. The Apriori algorithm is required to generate candidate itemsets, compute the support, and prune the candidate itemsets to the frequent itemsets in each iteration. The FP-growth algorithm discovers frequent itemsets without the time-consuming candidate generation process that is critical for the Apriori algorithm. Although the FP-growth algorithm generally outperforms the Apriori algorithm in most cases, several refinements of the Apriori algorithm have been made to speed up the process of frequent itemsets mining. This paper, implemented a parallel Apriori algorithm based on Bodons work and analyzed its performance on a parallel computer. The reason we adopted Bodons implementation for parallel computing is because Bodons implementation using the trie data structure outperforms the other implementations using hash tree. The rest of the paper is organized as follows. It introduces related work on frequent item sets mining. We present our implementation for parallel computing on frequent item sets mining. We present the experimental results of our implementation on a symmetric multiprocessing computer.
2. Methods to Improve Aprioris Efficiency Hash-based item set counting: A k-item set whose corresponding hashing bucket count is below the threshold cannot be frequent. Transaction reduction: A transaction that does not contain any frequent k-item set is useless in subsequent scans. Partitioning: Any item set that is potentially frequent in DB must be frequent in at least one of the partitions of DB. Sampling: mining on a subset of given data, lower support threshold + a method to determine the completeness. Dynamic item set counting: add new candidate item sets only when all of their subsets are estimated to be frequent.
need is a result of the explosive growth in data collected from applications including business and management, government administration, science and engineering, and environmental control. Data Mining Data Mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data ware houses, or other information repositories. It a young interdisciplinary field, drawing from areas such as database systems, data warehousing, statistics, machine learning, data visualization, information retrieval, and high performance computing. Other contributing areas include economics and bioinformatics. neural networks, pattern recognition, spatial data analysis, image databases, signal processing and many application fields, such as business,
A knowledge discovery process includes data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge presentation. Data Patterns Data Patterns can be mined from many different kinds of databases, such as relational databases, data warehouses, and transactional, object-relational, and object-2oriented databases. Interesting data patterns can also be extracted from other kinds of information repositories, including spatial, time related, text, multimedia, and legacy databases, and the World Wide Web (WWW). A Data Warehouse A data warehouse is a repository for long-term storage of data from multiple sources, organized so as to facilitate management decision making. The data are stored under a unified schema and are typically summarized. Data warehouse systems provide some data analysis capabilities, collectively referred to as OLAP(On-Line Analytical Processing). Suppose that All Electronics is a successful international company, with branches around the world. Each branch has its own set of databases. The president of All Electronics has asked you to provide an analysis of the companys sales per item type per branch for the third quarter. This is a difficult task, particularly since the relevant data are spread out over several databases, physically located at numerous sites. If All Electronics had a data warehouse, this task would be easy. A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. This process is discussed in Chapters 2 and 3. Figure 1.7 shows the typical framework for construction and use of a data warehouse for All Electronics.
Data Mining Functionalities Include the discovery of concept / class descriptions, association, classification, prediction, clustering, trend analysis, deviation analysis, and similarity analysis. Characterization and discrimination are forms of data summarization. We have observed various types of databases and information repositories on which data mining can be performed. Let us now examine the kinds of data patterns that can be mined. Data mining functionalities are used to specify the kind of patterns to be found in data mining tasks. In general, data mining tasks can be classified into two categories: descriptive and predictive. Descriptive mining tasks characterize the general properties of the data in the database. Predictive mining tasks perform Inference on the current data in order to make predictions. In some cases, users may have no idea regarding what kinds of patterns in their data may be interesting, and hence may like to search for several different kinds of patterns in parallel. Thus it is important to 7
have a data mining system hat can mine multiple kinds of patterns to accommodate different user expectations or applications. Furthermore, data mining systems should be able to discover patterns at various granularity (i.e., different levels of abstraction). Data mining systems should also allow users to specify hints to guide or focus the search for interesting patterns. Because some patterns may not hold for all of the data in the database, a measure of certainty or trustworthiness is usually associated with each discovered pattern. Data mining functionalities, and the kinds of patterns they can discover, are described below.
Data Mining Systems Data Mining Systems can be classified according to the kinds of databases mined, the kinds of knowledge mined, the techniques used, or the applications adapted. Data mining can be classified into descriptive data mining and predictive data mining. Concept description is the most basic form of descriptive data mining. It describes a given set of task- relevant data in a concise and summarative manner, presenting general properties of the data. Efficient and effective data mining in large databases poses numerous requirements and great challenges to researchers and developers. The issues involved include data mining methodology, user interaction, performance and scalability, and the processing of a large variety of data types. Other issues include the exploration of data mining applications and their social impacts.
between such systems and identify those that best match their needs. Data mining systems can be categorized according to various criteria, as follows:
Patter 6. What Should A Good Frequent Pattern Mining Algorithm Have? Data test n Good candidate pattern generation method:
9
Each rule
o Generating candidate patterns as less as possible. o The best method is Apriori Algorithm(AS 1994) Good database processing method: o Sorting, aggregation and classification of data set according to the intrinsic principle of frequent pattern mining. o Some unsuccessful methods: sampling(Toivonen 1996), partition(SON 1995) o Successful methods: tree projection(AAP 2000) o and FP-growth(HPY 2000)
7.
A Famous Frequent Pattern Mining Method Apriori Algorithm: Major idea: o A subset of a frequent itemset must be frequent. o A powerful candidate set pruning technique: it reduces candidate itemsets dramatically. o generate candidate frequent k-itemsets. o Use database scan and pattern matching to collect counts for the candidate itemsets Core of Apriori algorithm: Use frequent (k 1)-itemsets to
1.6 Methodology
Apriori Algorithm In Value-Added Service The Apriori algorithm discovers the frequent item sets from a very large database through a series of iterations. The Apriori algorithm is required to generate candidate item sets, compute the support, and prune the candidate item sets to the frequent item sets in each iteration. The FPgrowth algorithm discovers frequent item sets without the time-consuming candidate generation process that is critical for the Apriori algorithm. Although the FP-growth algorithm generally outperforms the Apriori algorithm in most cases, several refinements of the Apriori algorithm have been made to speed up the process of frequent item sets mining. 10
However, in this method, multiple passes have to be made over the database for each different value of minimum support and confidence. This number can be as large as the longest frequent item set. For very large databases of transactions, this may involve considerable inputoutput (I/O) and lead to unacceptable response times for online queries. Moreover, the potential number of frequent item sets is exponential to the number of different items, although the actual number of frequent item sets can be smaller. From the table we can see that surf on-line and Java have high association relationships among different value-added services in Mobile Telecom in China. Therefore telecom enterprises can the bundling selling these two kinds of services and provide more favorable package service. First, confirm that you have the correct template for your This algorithm computes frequent item sets from a transactions database over multiple iterations. Each iteration involves Candidate generation and Candidate counting and selection. Utilizing the knowledge about infrequent item sets, obtained from previous iterations, the algorithm prunes a priori those candidate item sets that cannot become frequent. After discarding every candidate item set that has an infrequent subset, the algorithm enters the candidate counting step. Value added service in Mobile Telecom includes SMS, surf on-line, CRBT, MMS, Java, IVR and so on. In the following tables, Apriority algorithm is applied to discover association relationships among different valueadded services in Mobile Telecom in China. Table depicts the value-added service items in four transactions. For a minimum support of 50% and a minimum confidence of 50%, we have the following rules (1) SMS = CRBT with 50% support and 66% confidence; (2) CRBT =>SMS with 50% support and 100% confidence.
Table 1.6.1 Application in telecom of association rule Mining The objective is to generate confident rules, having at least the minimum confidence. The problem decomposition proceeds as follows:
11
Find all sets of items that have minimum support, typically using the Apriority algorithm This is the most expensive phase of the search, and involves lots of research for reducing the complexity. Use the frequent item sets to generate the desired rules. Given m items there can be potentially 2m frequent item sets. Consider Table . For the rule SMS = CRBT , we have Support = Support ({SMS, CRBT }) = 50% And Confidence = Support ({SMS, CRBT }) / Support ({SMS}) = 66%.
Table 1.6.2 Computation Of Frequent Item Sets The Apriori algorithm is outlined as follows. Let FK be the set of frequent itemsets of size k, let Ck be the set of candidate itemsets of size k, and let F1 be the set of large items. We start from k = 1. (1) for all items in frequent item set FK repeat steps 2-4. (2)Generate new candidates Ck+1 from FK. (3)for each transaction T in the database, increment the count of all candidates in CK+1 that are contained in T. (4)Generate frequent item sets Fk+1of size k from candidates in CK+1 with minimum support. A key observation is that every subset of a frequent item set is also frequent. This implies that a candidate item set in CK+1 can be pruned if even one of its subsets is not contained in FK.
12
Table 1.6.3 Transactions Database For Frequent Item Set Generation A priori algorithm is explained with an example database of transactions provided in Table . Consider Table , with a minimum Support > 50%. After the first scan of the database, we have the candidate item sets C1 along with their corresponding Supports, as : SMS50%, { surf online } : 75%, { CRBT } : 75%, { MMS } : 25%, and { Java } : 75%.The frequent item sets FI consist of { surf on-line }, { CRBT }, and { Java }, each with Support of 75%.Now the candidate item sets C2 are surf on-line, CRBT , surf on-line, Java , CRBT, Java , with Supports of 50%, 75%, 50%, respectively. The corresponding frequent item set F2 becomessurf on-line, Javawith a Support of 75%.The rules generated are
CRBT=Java with Confidence =support({surfonline,java})/support({surfonline}) =100% and Java=CRBT with Confidence =support({surfonline,java})/support({java}) =100%
Table 1.6.4 Stages Of Apriority Algorithm Demonstrating Frequent Item Set Generation 13
However, in this method, multiple passes have to be made over the database for each different value of minimum support and confidence. This number can be as large as the longest frequent item set. For very large databases of transactions, this may involve considerable input-output (I/O) and lead to unacceptable response times for online queries. Moreover, the potential number of frequent item sets is exponential to the number of different items, although the actual number of frequent item sets can be smaller. From the table we can see that surf on-line and Java have high association relationships among different value-added services in Mobile Telecom in China. Therefore telecom enterprises can the bundling selling these two kinds of services and provide more favorable package service.
Applications
Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc., mining for associations among items in a large database of sales transaction is an important database mining function. below: Keyboard Mouse [support = 6%, confidence = 70%] Based on the types of values, the association rules can be classified into two categories: Boolean Association Rules and Quantitative Association Rules* Boolean Association Rule: Keyboard Mouse [support = 6%, Confidence = 70%] Quantitative Association Rule: (Age = 2630) (Cars =1, 2) [Support 3%, confidence = 36%] Minimum Support Threshold The support of an association pattern is the percentage of task-relevant data transactions for which the pattern is true. Minimum Confidence Threshold For example, the information that a customer who purchases a keyboard also tends to buy a mouse at the same time is represented in association rule
14
Confidence is defined as the measure of certainty or trustworthiness associated with each discovered pattern. Association Rule Mining Process Find all frequent item sets. Each support S of these frequent item sets will at least equal to a pre-determined min sup (An item set is a subset of items in I, like A). Generate strong association rules from the frequent item sets. These rules must be the frequent item sets and must satisfy min_sup and min_conf.
Bodon catches the attention. The results of the Bodons implementation for finding frequent itemsets appear to be faster than the ones by Borgelt and Goethals. Thus revised Bodons implementation into a parallel one where input transactions are read by a parallel computer. The effect a parallel computer on this modified implementation is presented. F. Bodon, said the efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation details are almost always neglected. In this paper we show that the effect of implementation can be more important than the selection of the algorithm. Ideas that seem to be quite promising may turn out to be ineffective if we descend to the implementation level. We theoretically and experimentally analyze APRIORI which is the most established algorithm for frequent item set mining. Several implementations of the algorithm have been put forward in the last decade. Although they are implementations of the very same algorithm, they display large differences in running time and memory need. In this paper we describe an implementation of APRIORI that outperforms all implementations known to us. Thus analyze, theoretically and experimentally, the principal data structure of our solution. This data structure is the main factor in the efficiency of our implementation. Moreover, it presents a simple modification of APRIORI that appears to be faster than the original algorithm.
3. SYSTEM STUDY
Finding frequent itemsets in transaction databases has been demonstrated to be useful in several business applications. Many algorithms have been proposed to find frequent itemsets from a very large database. However, there is no published implementation that outperforms every other implementation on every database with every support threshold. In general, many implementations are based on the two main algorithms: Apriori and frequent pattern growth (FP-growth). The 16
Apriori algorithm discovers the frequent itemsets from a very large database through a series of iterations. The Apriori algorithm is required to generate candidate itemsets, compute the support, and prune the candidate itemsets to the frequent itemsets in each iteration. The FP-growth algorithm discovers frequent itemsets without the time-consuming candidate generation process that is critical for the Apriori algorithm. Although the FP-growth algorithm generally outperforms the Apriori algorithm in most cases, several refinements of the Apriori algorithm have been made to speed up the process of frequent itemsets mining.
ones by Borgelt and Goethals. In this paper, revised about Bodons implementation into a parallel one where input transactions are read by a parallel computer. The effect a parallel computer on this modified implementation is presented.
4. SYSTEM DESCRIPTION
4.1 Modules
a) Input Data Module b) Calculation for Classification based on Multiple Association Rules(CMAR). c) Calculation for Apriori Algorithm to find Frequent Item Sets. 18
d) GUI Designing.
a) Input Data Module Here we create the item sets. We are taking the Transaction list This is the .dat files. In this module we take an Input Text File and convert them into string tokens. We convert the tokens in to sets.
Example: sms,mms,Surf on-line,crbt. sms, Surf on-line,Java,Ivr,crbt. Surf on-line,sms,mms. Java,Ivr, Surf on-line. Java,Sms,Mms.
b) Calculation for Classification based on Multiple Association Rules(CMAR). CMAR is Classification Based On Multiple Association Rules. All our calculations are based upon the Association Rules. i. ii. iii. iv. data input and input error checking data preprocessing, manipulation of records (e.g. operations such as subset, member, union etc.) and data and parameter output.
The CMAR algorithm (as described in Li et al 2001) uses an FP-growth algorithm (Han et al. 2000) to produce a set of CARs which are stored in data structure referred to as a CR tree. CARs are inserted into the CR tree if: The Chi-Squared value is above a user specified critical threshold (Li et al. suggest a5% significance level; assuming a degree of freedom equivalent to 1, this will equate to a threshold of 3.8415), and 19
The CR tree does not contain a more general rule with a higher priority.Given two CARs, R1 and R2, R1 is said to be a more general rule than R2 if the antecedent for R1 is a subset of the antecedent for R2. In CMAR CARS are prioritised using the following ordering: Confidence: A rule r1 has priority over a rule r2 if confidence(r1) > confidence(r2). Support: A ruler1 has priority over a rule r2 if confidence(r1)==confidence(r2) && support(r1)>support(r2). Size of antecedent : A rule r1 has priority over a rule r2 if confidence(r1)==confidence(r2) && support(r1)==support(r2) && |Ar1|<|Ar2|. Once a complete CR tree has been produced the generated rules are placed into a list R, ordered according to the above prioritisation, which is then pruned. The pruning algorithm (using the cover principle) is presented below. T set of training records C array of length |T| with value of elements set to 0 R The prioritised rule list R' empty rule list For each r in R if (T=null) break; coverFlag <-- false for each ti in T Loop (1 <= i <= [T|) if (r.antecedent subset ti) rule staisfies record C[i] <-- C[i]+1 coverFlag <-- true End loop if (coverFlag=true) rule satisfies at least one record R' <-- R' union r Loop from i<--1 to i<--|c| in steps of 1 if ci> MIN_COVER remove ti from T End loop R' is then the resulting classifier. In their experiments Li et al. used a MIN_COVER value of 3, i.e. each record had to be satisfied (covered) by at least three rules before it is no longer to be considered in the generation process and removed from the training set. To test the resulting classifier Li et al. propose the following process. Given a record r in the test set: 20
Collect all rules that satisfy r, and If consequents of all rules are all identical, or only one rule, classify record according to the consequents. Else group rules according to classifier and determine the combined effect of the rules in each group, the classifier associated with the "strongest group" is then selected. The strength of a group is calculate using a Weighted Chi Squared (WCS) measure. This is done by first defining a Maximum Chi Squared (MCS) value for each rule A->c: MCS = (min(sup(A),sup(c)) - sup(A)sup(c)/N)^2 * N * e Where: sup(P) = support for antecedent. sup(c) = support for consequent. N = Number of records in test set. e is calculated as follows: e = 1/sup(A)sup(c) + 1/sup(A)N-sup(c) + 1/N-sup(A)sup(c) + 1/(N-sup(A))(N-sup(c)) For each group of rules the Weighted Chi Squared value is defined as: WCS = The sum of (Chi-Squared * Chi-Squared)/(MCS)
AssocRuleMining.java: Set of general ARM utility methods to allow: (i) data input and input error checking, (ii) data preprocessing, (iii) manipulation of records (e.g. operations such as subset, member, union etc.) and (iv) data and parameter output. PartialSupportTree.java: Methods to implement the "Apriori-TFP" algorithm using both the "Partial support" and "Total support" tree data structure (P-tree and T-tree). PtreeNode.java: Methods concerned with the structure of Ptree nodes. PtreeNodeTop.java: Methods concerned with the structure of the top level of the P-tree which comprises, to allow direct indexing for reasons of efficiency, an array of "top P-tree nodes". RuleList.java: Set of methods that allow the creation and manipulation (e.g. ordering, etc.) of a list of ARs. TotalSupportTree.java: Methods to implement the "Apriori-T" algorithm using the "Total support" tree data structure (T-tree). 21
TtreeNode.java: Methods concerned with the structure of Ttree nodes. Apriori TFP class.java: Parent class for classification rule generator. AprioriTFP_CMAR.java: Aprori-TFPC CMAR algorithm. ClassCMAR_App.java: Fundamental LUCS-KDD CMAR application using a 50:50 training/test set split. The P-tree Node, P-tree Node Top and T-tree Node classes are separate to the remaining classes which are arranged in a class hierarchy of the form illustrated below.
c) Calculation for Apriori Algorithm to find Frequent Item Sets. Here we will find out the Associativity of theitemsets. We will take the support and the confidence of each itemset. And we will find out the probability of the item sets.
Support/confidence Support shows the frequency of the patterns in the rule; it is the percentage of transactions that contain both A and B, i.e. Support = Probability(A and B) Support = (# of transactions involving A and B) / (total number of transactions). Confidence is the strength of implication of a rule; it is the percentage of transactions that contain B if they contain A, ie. Confidence = Probability (B if A) = P(B/A) Confidence = (# of transactions involving A and B) / (total number of transactions that have A). Example: Customer Item purchased 22 Item purchased
1 2 3 4
Table 4.1.1 Data set Table If A is purchased pizza and B is purchased soda then Support = P(A and B) = Confidence = P(B / A) = Confidence does not measure if the association between A and B is random or not. For example, if milk occurs in 30% of all baskets, information that milk occurs in 30% of all baskets with bread is useless. But if milk is present in 50% of all baskets that contain coffee, that is significant information. Support allows us to weed out most infrequent combinations but sometimes we should not ignore them, for example, if the transaction is valuable and generates a large revenue, or if the products repel each other. Ex. We measure the following: P(Coke in a basket) = 50% P(pepsi in a basket) = 50% P(coke and peps in a basket) = 0.001% What does this mean? If Coke and Pepsi were independent, we would expect that P(coke and pepsi in a basket) = .5*0.5 = 0.25. The fact that the joint probability is much smaller says that the products are dependent and that they repel each other. d) GUI Designing The GUI designing is done using Swings.
What Is Swing?
23
Swing is a tool kit in java. It is part of Sun Microsoft System JFC i.e., Java Foundation Classes an API for providing a GUI for Java programs. Swing includes GUI widgets such as text boxes, buttons, split-panes, and tables. Swings are called Light Weight Process. U can develop our own look n feel using swings. If you poke around the Java home page (http://java.sun.com/ ), you'll find Swing advertised as a set of customizable graphical components whose look-and-feel can be dictated at runtime. In reality, however, Swing is much more than this. Swing is the next-generation GUI toolkit that Sun Microsystems is developing to enable enterprise development in Java. By enterprise development, we mean that programmers can use Swing to create large-scale Java applications with a wide array of powerful components. In addition, you can easily extend or modify these components to control their appearance and behavior. Swing is not an acronym. The name represents the collaborative choice of its designers when the project was kicked off in late 1996. Swing is actually part of a larger family of Java products known as the Java Foundation Classes ( JFC), which incorporate many of the features of Netscape's Internet Foundation Classes (IFC), as well as design aspects from IBM's Taligent division and Lighthouse Design. Swing has been in active development since the beta period of the Java Development Kit (JDK)1.1, circa spring of 1997. The Swing APIs entered beta in the latter half of 1997 and their initial release was in March of 1998. When released, the Swing 1.0 libraries contained nearly 250 classes and 80 interfaces. Although Swing was developed separately from the core Java Development Kit, it does require at least JDK 1.1.5 to run. Swing builds on the event model introduced in the 1.1 series of JDKs; you cannot use the Swing libraries with the older JDK 1.0.2. In addition, you must have a Java 1.1- enabled browser to support Swing applets.
of Java Archive (JAR) files from the Swing home page In either case, it is generally a good idea to visit this URL for any extra packages or look-and-feels that may be distributed separately from the core Swing libraries. Figure 4.1.1 shows the relationship between Swing, AWT, and the Java Development Kit in both the 1.1 and 1.2 JDKs. In JDK 1.1, the Swing classes must be downloaded separately and included as an archive file on the classpath (swingall.jar)JDK 1.2 comes with a Swing distribution, although the relationship between Swing and the rest of the JDK has shifted during the beta process. Nevertheless, if you have installed JDK 1.2, you should have Swing. The standalone Swing distributions contain several other JAR files. swingall.jar is everything (except the contents of multi.jar) wrapped into one lump, and is all you normally need to know about. For completeness, the other JAR files are: swing.jar, which contains everything but the individual look-and-feel packages; motif.jar, which contains the Motif (Unix) look-and-feel; windows.jar, which contains the Windows look-and-feel; multi.jar, which contains a special lookand feel that allows additional (often non-visual) L&Fs to be used in conjunction with the primary L&F; and beaninfo.jar, which contains special classes used by GUI development tools.
Figure 4.1.1 Relationships Between Swing, Awt, And The Jdk In The 1.1 And 1.2 Jdks Swing contains nearly twice the number of graphical components as its immediate predecessor, AWT 1.1. Many are components that have been scribbled on programmer wish-lists since Java first debutedincluding tables, trees, internal frames, and a plethora of advanced text components. In addition, Swing contains many design advances over AWT. For example, Swing introduces a new Action class that makes it easier to coordinate GUI components with the functionality they perform. You'll also find that a much cleaner design prevails throughout Swing; this cuts down on the number of unexpected surprises that you're likely to face while coding. 25
Swing depends extensively on the event handling mechanism of AWT 1.1, although it does not define a comparatively large amount of events for itself. Each Swing component also contains a variable number of exportable properties. This combination of properties and events in the design was no accident. Each of the Swing components, like the AWT 1.1 components before them, adhere to the popular JavaBeans specification. As you might have guessed, this means that you can import all of the Swing components into various GUI-builder toolsuseful for powerful visual programming.
Swing Features
Swing provides many new features for those planning to write large-scale applications in Java. Here is an overview of some of the more popular features. Pluggable Look-and-Feels One of the most exciting aspects of the Swing classes is the ability to dictate the look-andfeel (L&F) of each of the components, even resetting the look-and-feel at runtime. Look-and-feels have become an important issue in GUI development over the past five years. Most users are familiar with the Motif style of user interface, which was common in Windows 3.1 and is still in wide use on Unix platforms. Microsoft has since deviated from that standard with a much more optimized look-and-feel in their Windows 95/98 and NT 4.0 operating systems. In addition, the Macintosh computer system has its own branded look-and-feel, which most Apple users feel comfortable with. Swing is capable of emulating several look-and-feels, and currently includes support for Windows 98 and Unix Motif. This comes in handy when a user would like to work in the L&F environment which he or she is most comfortable with. In addition, Swing can allow the user to switch look-and feels at runtime without having to close the current application. This way, a user can experiment to see which L&F is best for them with instantaneous feedback. And, if you're feeling really ambitious as a developer (perhaps a game developer), you can create your own look-and-feel for each one of the Swing components! Swing comes with a default look-and-feel called "Metal," which was developed while the Swing classes were in the beta-release phase. This look-and-feel combines some of the best graphical elements in today's L&Fs and even adds a few surprises of its own. Figure 1.3 shows an example of several look-and-feels that you can use with Swing, including the new Metal look-andfeel. All Swing L&Fs are built from a set of base classes called the Basic L&F. However, though we may refer to the Basic L&F from time to time, you can't use it on its own. 26
Lightweight Components
Most Swing components are lightweight. In the purest sense, this means that components are not dependent on native peers to render themselves. Instead, they use simplified graphics primitives to paint themselves on the screen and can even allow portions to be transparent. The ability to create lightweight components first emerged in JDK 1.1, although the majority of AWT components did not take advantage of it. Prior to that, Java programmers had no choice but to subclass java.awt.
Canvas
or java.awt. Panel if they wished to create their own components. With both classes, Java
allocated an opaque peer object from the underlying operating system to represent the component, 27
forcing each component to behave as if it were its own window, taking on a rectangular, solid shape. Hence, these components earned the name "heavyweight," because they frequently held extra baggage at the native level that Java did not use. Heavyweight components were unwieldy for two reasons: Equivalent components on different platforms don't necessarily act alike. A list component on one platform, for example, may work differently than a list component on another. Trying to coordinate and manage the differences between components was a formidable task. The look-and-feel of each component was tied to the host operating system and could not be changed.
Additional Features
Several other features distinguish Swing from the older AWT components: A wide variety of new components, such as tables, trees, sliders, progress bars, internal frames, and text components. Swing components contain support for replacing their insets with an arbitrary number of concentric borders. Swing components can have tooltips placed over them. A tooltip is a textual popup that momentarily appears when the mouse cursor rests inside the component's painting region. Tooltips can be used to give more information about the component in question. You can arbitrarily bind keyboard events to components, defining how they will react to various keystrokes under given conditions. There is additional debugging support for the rendering of your own lightweight Swing components. We will discuss each of these features in greater detail as we move.
Advantages of Swings Swing widgets provide more sophisticated GUI Components Swing Components are not implemented by platform specific code. They are written purely in Java and are called as platform independent. Advantage is uniform behavior on all Swing supports a pluggable look and feel .This means you can get any supported look and feel on any platform or by using the current platform's graphics interface to achieve 28
consistency through modifications made by additional API calls. This means the application can use any supported look and feel on any platform. Disadvantage of Swings The disadvantage of lightweight components is slower execution.
Flow Diagram
CMAR=Calculation for Classification based on Multiple Association Rules
29
OUTPUTS
UTILITIES
30
Another approach - the Six Sigma methodology - is a well-structured, data-driven methodology for eliminating defects, waste, or quality control problems of all kinds in manufacturing, service delivery, management, and other business activities. This model has recently become very popular (due to its successful implementations) in various American industries, and it appears to gain favor worldwide. It postulated a sequence of, so-called, DMAIC steps -
- that grew up from the manufacturing, quality improvement, and process control traditions and is particularly well suited to production environments (including "production of services," i.e., service industries). Another framework of this kind (actually somewhat similar to Six Sigma) is the approach proposed by SAS Institute called SEMMA -
- which is focusing more on the technical activities typically involved in a data mining project. All of these models are concerned with the process of how to integrate data mining methodology into an organization, how to "convert data into information," how to involve important stakeholders, and how to disseminate the information in a form that can easily be converted by stakeholders into resources for strategic decision making. Some software tools for data mining are specifically designed and documented to fit into one of these specific frameworks.
31
The general underlying philosophy of Stat Soft's STATISTICA Data Miner is to provide a flexible data mining workbench that can be integrated into any organization, industry, or organizational culture, regardless of the general data mining process-model that the organization chooses to adopt. For example, STATISTICA Data Miner can include the complete set of (specific) necessary tools for ongoing company wide Six Sigma quality control efforts, and users can take advantage of its (still optional) DMAIC-centric user interface for industrial data mining tools. It can equally well be integrated into ongoing marketing research, CRM (Customer Relationship Management) projects, etc. that follow either the CRISP or SEMMA approach - it fits both of them perfectly well without favoring either one. Also, STATISTICA Data Miner offers all the advantages of a general data mining oriented "development kit" that includes easy to use tools for incorporating into your projects not only such components as custom database gateway solutions, prompted interactive queries, or proprietary algorithms, but also systems of access privileges, workgroup management, and other collaborative work tools that allow you to design large scale, enterprise-wide systems (e.g., following the CRISP, SEMMA, or a combination of both models) that involve your entire organization. Predictive Data Mining The term Predictive Data Mining is usually applied to identify data mining projects with the goal to identify a statistical or neural network model or set of models that can be used to predict some response of interest. For example, a credit card company may want to engage in predictive data mining, to derive a (trained) model or set of models (e.g., neural networks, metalearner) that can quickly identify transactions which have a high probability of being fraudulent. Other types of data mining projects may be more exploratory in nature (e.g., to identify cluster or segments of customers), in which case drill-down descriptive and exploratory methods would be applied. Data reduction is another possible objective for data mining (e.g., to aggregate or amalgamate the information in very large data sets into useful and manageable chunks).
platforms. It was fairly secure and its security was configurable, allowing network and file access to be restricted. Major web browsers soon incorporated the ability to run secure Java applets within web pages. Java quickly became popular. With the advent of Java 2, new versions had multiple configurations built for different types of platforms. For example, J2EE was for enterprise applications and the greatly stripped down version J2ME was for mobile applications. J2SE was the designation for the Standard Edition. In 2006, for marketing purposes, new J2 versions were renamed Java EE, Java ME, and Java SE, respectively. In 1997, Sun Microsystems approached the ISO/IEC JTC1 standards body and later the Ecma International to formalize Java, but it soon withdrew from the process. Java remains a de facto standard that is controlled through the Java Community Process. At one time, Sun made most of its Java implementations available without charge although they were proprietary software. Sun's revenue from Java was generated by the selling of licenses for specialized products such as the Java Enterprise System. Sun distinguishes between its Software Development Kit (SDK) and Runtime Environment (JRE) which is a subset of the SDK, the primary distinction being that in the JRE, the compiler, utility programs, and many necessary header files are not present. On 13 November 2006, Sun released much of Java as free software under the terms of the GNU General Public License (GPL). On 8 May 2007 Sun finished the process, making all of Java's core code open source, aside from a small portion of code to which Sun did not hold the copyright.
Primary Goals There were five primary goals in the creation of the Java language: 1. It should use the object-oriented programming methodology. 2. It should allow the same program to be executed on multiple operating systems. 3. It should contain built-in support for using computer networks. 4. It should be designed to execute code from remote sources securely. 5. It should be easy to use by selecting what were considered the good parts of other objectoriented languages
33
The Java Programming Language The Java programming language is a high-level language that can be characterized by all of the following buzzwords: Simple Object oriented Distributed Multithreaded Dynamic Architecture neutral Portable High performance Robust Secure
Table 4.3.1 Properties Of Programming Language Each of the preceding buzzwords is explained in The Java Language Environment a white paper written by James Gosling and Henry McGilton. In the Java programming language, all source code is first written in plain text files ending with the .java extension. Those source files are then compiled into .class files by the javac compiler. A .class file does not contain code that is native to your processor; it instead contains bytecodes the machine language of the Java Virtual Machine1 (Java VM). The java launcher tool then runs your application with an instance of the Java Virtual Machine.
Figure 4.3.1 An Overview Of The Software Development Process. Because the Java VM is available on many different operating systems, the same .class files are capable of running on Microsoft Windows, the Solaris TM Operating System (Solaris OS), Linux, or Mac OS. Some virtual machines, such as the Java HotSpot virtual machine, perform additional steps at runtime to give your application a performance boost. This include various tasks such as finding performance bottlenecks and recompiling (to native code) frequently used sections of code.
34
Figure 4.3.2 Through The Java Vm, The Same Application Is Capable Of Running On Multiple Platforms.
The Java Virtual Machine The Java Application Programming Interface (API)
You've already been introduced to the Java Virtual Machine; it's the base for the Java platform and is ported onto various hardware-based platforms. The API is a large collection of ready-made software components that provide many useful capabilities. It is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? highlights some of the functionality provided by the API. 35
Figure 4.3.3 The API and Java Virtual Machine insulate the program from the underlying hardware. As a platform-independent environment, the Java platform can be a bit slower than native code. However, advances in compiler and virtual machine technologies are bringing performance close to that of native code without threatening portability.
Uses OF JAVA
Blue is a smart card enabled with the secure, cross-platform, object-oriented Java Card API and technology. Blue contains an actual on-card processing chip, allowing for enhanceable and multiple functionality within a single card. Applets that comply with the Java Card API specification can run on any third-party vendor card that provides the necessary Java Card Application Environment (JCAE). Not only can multiple applet programs run on a single card, but new applets and functionality can be added after the card is issued to the customer Java Can be used in Chemistry. 36
In NASA also Java is used. In 2D and 3D applications java is used. In Graphics Programming also Java is used. In Animations Java is used. In Online and Web Applications Java is used.
5. IMPLEMENTATION
37
measure of interestingness of a pattern, namely, objective and subjective. The former uses the structure of the pattern and is generally quantitative. Often it fails to capture all the complexities of the pattern discovery process. The subjective approach, on the other hand, depends additionally on the user-who examines the pattern. Association rules initially were developed in the context of market-basket analysis, they have proved useful in a wide range of applications. However, association rules are not directly applicable to numeric data. The standard approach to incorporating numeric data is discrimination, a process by which the values of numeric fields in a database are divided into sub ranges. Each such sub range is treated as an item in the association rule analysis. Applications of association rule analysis are not restricted to such analyses of purchasing behavior, however. The manner in which association rule approaches identify all rules that satisfy some set of measures of interestingness make them useful for a wide variety of applications in which the user seeks insight into a set of data. The user can specify the quantifiable measures of what makes a rule interesting. Further factors may apply that are difficult to quantify, but the user can apply these manually to select the most useful of the rules discovered by the automated system. Another application of association rules has been for classification. Rather than the classical approach of learning classification rules, a large set of high confidence association rules are discovered. To classify a new object, all rules that cover the object are identified, and then the relative support and confidence of each rule is used to identify the most probable class for the object Early approaches to association rule discovery sought all rules that satisfy constraints on support and confidence. More recent techniques utilize additional measures of how interesting a rule is, including lift and leverage. the current related works about the research of this topic is still less. The innovation of this paper lies in the association rules algorithm is applied to mobile value added services. The conclusion of the research is valuable for Telecom operators in China to develop the value-added services. Support Support shows the frequency of the patterns in the rule; it is the percentage of transactions that contain both A and B, i.e. Support = Probability(A and B) Support = (# of transactions involving A and B) / (total number of transactions).
Confidence 39
Confidence is the strength of implication of a rule; it is the percentage of transactions that contain B if they contain A, i.e. Confidence = Probability (B if A) = P(B/A) Confidence = (# of transactions involving A and B) / (total number of transactions that have A).
40
Transactional database refers to the collection of transaction records, in most cases they are sales records. With the popularity of computer and e-commerce, massive transactional databases are available now. Data mining on transactional database focuses on the mining of association rules, finding the correlation between items in the transaction records. One of data mining techniques, generalized association rule mining with taxonomy, is potential to discover more useful knowledge than ordinary flat association rule mining by taking application specific information into account [27]. In particular in retail one might consider as items particular brands of items or whole groups like milk, drinks or food. The more general the items chosen the higher one can expect the support to be. Thus one might be interested in discovering frequent itemsets composed of items which themselves form taxonomy. Earlier work on mining generalized association rules ignore the fact that the taxonomies of items cannot be kept static while new transactions are continuously added into the original database. How to effectively update the discovered generalized association rules to reflect the database change with taxonomy evolution and transaction update is a crucial task. Tseng et al [34] examine this problem and propose a novel algorithm, called IDTE, which can incrementally update the discovered generalized association rules when the taxonomy of items is evolved with new transactions insertion to the database. Empirical evaluations show that this algorithm can maintain its performance even in large amounts of incremental transactions and high degree of taxonomy evolution, and is more than an order of magnitude faster than applying the best generalized associations mining algorithms to the whole updated database. Spatial databases usually contain not only traditional data but also the location or geographic information about the corresponding data. Spatial association rules describe the relationship between one set of features and another set of features in a spatial database, for example (Most business centers in Greece are around City Hall), the spatial operations that used to describe the correlation can be within, near, next to, etc. The form of spatial association rules is also XY, where X, Y are sets of predicates and of which some are spatial predicates, and at least one must be a spatial predicate.. Temporal association rules can be more useful and informative than basic association rules. Recent Advances In Association Rule Discovery A serious problem in association rule discovery is that the set of association rules can grow to be unwieldy as the number of transactions increases, especially if the support and confidence thresholds are small. As the number of frequent itemsets increases, the number of rules presented to the user typically increases proportionately. Many of these rules may be redundant. 41
Redundant Association Rules To address the problem of rule redundancy, four types of research on mining association rules have been performed. First, rules have been extracted based on user-defined templates or item constraints. Secondly, researchers have developed interestingness measures to select only interesting rules. Thirdly, researchers have proposed inference rules or inference systems to prune redundant rules and thus present smaller and usually more understandable sets of association rules to the user. Finally, new frameworks for mining association rule have been proposed that find association rules with different formats or properties. Ashrafi et al presented several methods to eliminate redundant rules and to produce small number of rules from any given frequent or frequent closed itemsets generated. Ashrafi et al present additional redundant rule elimination methods that first identify the rules that have similar meaning and then eliminate those rules. Furthermore, their methods eliminate redundant rules in such a way that they never drop any higher confidence or interesting rules from the resultant rule set. Jaroszewicz and Simovici presented another solution to the problem using the Maximum Entropy approach. The problem of efficiency of Maximum Entropy computations is addressed by using closed form solutions for the most frequent cases. Analytical and experimental evaluation of their proposed technique indicates that it efficiently produces small sets of interesting association rules. Moreover, there is a need for human intervention in mining interesting association rules. Such intervention is most effective if the human analyst has a robust visualization tool for mining and visualizing association rules. Techapichetvanich and Datta presented a three-step visualization method for mining market basket association rules. These steps include discovering frequent itemsets, mining association rules and finally visualizing the mined association rules. Negative Association Rules Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. Mining negative association rules is a difficult task, due to the fact that there are essential differences between positive and negative association rule mining. The researchers attack two key problems in negative association rule mining: (i) how to effectively 42
search for interesting itemsets, and (ii) how to effectively identify negative association rules of interest. Brin et. al [8] mentioned for the first time in the literature the notion of negative relationships. Their model is chi-square based. They use the statistical test to verify the independence between two variables. To determine the nature (positive or negative) of the relationship, a correlation metric was used. In [28] the authors present a new idea to mine strong negative rules. They combine positive frequent itemsets with domain knowledge in the form of taxonomy to mine negative associations. However, their algorithm is hard to generalize since it is domain dependant and requires a predefined taxonomy. A similar approach is described in [37]. Wu et al [40] derived a new algorithm for generating both positive and negative association rules. They add on top of the support-confidence framework another measure called mininterest for a better pruning of the frequent itemsets generated. In [32] the authors use only negative associations of the type X Y to substitute items in market basket analysis.
6. UML REPRESENTATION
43
Unified Modeling Language is a graphical visualization language. It consists of a series of symbols and connectors that can be used to create process diagrams and is often used to model computer programs and workflows. The Unified Modeling Language (UML) is used to specify, visualize, modify, construct and document the artifacts of an object-oriented software intensive system under development. UML offers a standard way to visualize a system's architectural blueprints, including elements such as: UML combines techniques from data modeling (entity relationship diagrams), business modeling (work flows), object modeling, and component modeling. It can be used with all processes, throughout the software development life cycle, and across different implementation technologies. UML has synthesized the notations of the Booch method, the Object-modeling technique (OMT) and Object-oriented software engineering (OOSE) by fusing them into a single, common and widely usable modeling language. UML aims to be a standard modeling language which can model concurrent and distributed systems. UML is a de facto industry standard, and is evolving under the auspices of the Object Management Group (OMG). OMG initially called for information on object-oriented methodologies that might create a rigorous software modeling language. Many industry leaders have responded in earnest to help create the UML standard. UML models may be automatically transformed to other representations (e.g. Java) by means of QVT-like transformation languages, supported by the OMG. UML is extensible, offering the following mechanisms for customization: profiles and stereotype. The semantics of extension by profiles have been improved with the UML 2.0 major revision.
44
Use Case is a description of set of sequence of actions that a system performs that yields an observable result of value to a particular actor. A use case is used to structure the behavioral things in a model. A use case is realized by collaborations.
S u p p o rt va lu e
user
d a t a fil e
A p r io ri r u le
C o n fid e n c e va l u e
A class diagram is a collection of classes, objects and its relationships. Set of objects that there share some Attributes, Operations, Relationships, and Semantics. A class implements one or more interfaces (defined later).Graphically, a class is rendered as a rectangle, usually including its name, attributes and operations.
Main f_output p_output p_main mb_main aprioriImpl add() setJMenuBar() setSize() setLocationRelativeTo() setVisible()
A sequence diagram gives the flow of activities among the objects involved in the interaction with each other.
Gui Design
Item sets
Apriori
Frequent sets
Run Apriori
47
1: us er interfac e is deis ned 2: trans ac tion dat file is s elec ted G ui Des ign Item s ets
3: Run A priori 4: c hec k the as s oc iativity 5: s upport values are c alc ulated 6: c onfidenc e values are c alc ulated A priori Frequent s ets
7. SCREEN SHOTS
48
The frequent itemset is evaluated using num file. Itemsets are converted into num files and given as input. To generate frequent itemsets support and confidence is given as threshold and support count of an itemset is an aggregated of all local support count of itemset. To generate frequent itemset, a threshold value has to be specified which is known as minimum support count. The itemset is set to be frequent if it satisfies support >= minimum support. Then the candidate itemset is generated using support count in iterative passes. If no more items satisfy support count the process is stopped and frequent itemset is generated.
1) Main Window 49
2) Select File 50
51
3) File Selected
4) Item sets 52
7) Associativity Result
55
8.
TESTING
56
Testing is a process of executing a program with the intent of finding an error. A good test has a high probability of finding an as yet undiscovered error. A successful test is one that uncovers an as yet undiscovered error
The objective is to design tests that systematically uncover different classes of errors and do so with a minimum amount of time and effort. Testing cannot show the absence of defects, it can only show that software defects are present.
Error condition handled by system run-time before error handler gets control. Exception condition processing incorrect.
During security testing the tester plays the role of the individual trying to penetrate the system. Large range of methods: Attempt to acquire passwords through external clerical means. Use custom software to attack the system. Overwhelm the system with requests. Cause system errors and attempt to penetrate the system during recovery. Browse through insecure data.
Given time and resources, the security of most systems can be breached.
59
Construction Stage This stage includes the actual execution of code with test data. Code walkthrough and inspection are conducted. Static analysis, Dynamic analysis, Construction of test drivers, hair nesses and stubs are done. Control and management of test process is critical. All test sets, test results and test reports should be catalogued and stored. Operation and Maintenance Stage Modifications done to the software requires retesting this is termed regression testing. Changes at a given level will necessitate retesting at all levels below it. Approaches Two basics approach: 1. Black box or "Functional" analysis 2. White box or "Structural" analysis Boundary value analysis (Stress Testing) In this method the input data is partitioned and data inside and at the boundary of each partition is tested. Design Based Functional Testing Functional hierarchy is constructed. For each function at each level extrenal, non-extremal and special value test data are identified. Test data is identified such that it will generate extremal, non-extremal and special output values. Cause-Effect Graphing In this method the characteristic input stimuli (Causes), characteristic output classes (effects) are identified. The dependencies are identified using specification. These details are presented as directed graph. Test cases are chosen to test dependencies.
Coverage-Based Testing
60
The Program is represented as control-flow graph. The paths are identified. Data are chosen to maximize paths executed under test conditions. For paths that are not always finite and those infeasible, Coverage metrics can be applied. . Complexity-Based Testing The Cyclomatic Complexity is measured. The paths actually executed by program running on test data are identified and the actual complexity is set. A test set is devised which will drive actual complexity closer to Cyclomatic complexity. Test Data Analysis During Test Data Analysis The Goodness of the test data set" is taken into major consideration.
Statistical Analysis And Error Seeding Known errors are seeded into the code so that their placement is statistically similar to that of actual errors . Mutation Analysis It is assumed that a set of test data that can uncover all simple faults in a program is capable of detecting more complex faults. In mutation analysis a large number of simple faults, called mutation, are introduced in a program one at a time .The resulting changed versions of the test program are called mutates. Test data is then be constructed to cause these mutants to fail. The effectiveness of the test data set is measured by the percentage to mutants killed. Test Results The listed tests were conducted in the software at the various developments stages. Unit testing was conducted. The errors were debugged and regression testing was performed. The integration testing will be performed once the system is integrated with other related systems like Inventory, Budget etc. Once the design stage was over the Black Box and White Box Testing was performed on the entire application. The results were analyzed and the appropriate alterations were made. The test results proved to be positive and henceforth the application is feasible and test approved. 61
62
9. CONCLUSION
Implementation of the Apriori based algorithm focus on the way candidate itemsets generated, the optimization of data structures for storing itemsets. Bodon presented an implementation that solved frequent itemsets mining problem in most cases faster than other well known implementations. In this paper, Bodons implementation is used for parallel computing. The efficiency of time and results are improved with the help of apriori algorithm. Data mining is a good area of scientific study holding ample promise for the research community Lots of progress has been reported for large databases, specifically involving association rules, classification, clustering, similar time sequences, similar text document retrieval, similar image retrieval, outlier discovery, etc. Many papers have been published in major conferences and leading journals. However, it still remains a promising and rich field with many challenging research issues. With the fierce competition of telecommunications industry, business managers have become more and more aware of the importance of marketing. It is believed that more extensive uses of data mining technology in the telecommunications industry will make enterprises control the loss of customers from the source. "Preventive measures" can avoid many unnecessary losses to a large extent, so that enterprises will be in an invincible position in the increasingly fierce market competition
63
10.REFERENCES
[1] [2] [3] [4] [5] Jiawei Iian,Michelins Kamber.Data mining: Concepts and techniques[M].America: Morgan Kaufman Publishers, 2000. Alex Berson,Stephen Smith,Kert Thearling. Building data mining applications CRM[M]. America: McGraw-Hill Companies, 2000. Sushmita Mitra,Tinku Acharya. Data Mining Multimedia, Soft Computing, and Bioinformatics, Wiley Publishing , 2003 Lin T . Y . Cerone N. Rough Sets and Data Mining Analysis [M ] , USA, KluwerAcademic Pubishers, 1997. M Kamber , R Shinghal . Evaluating the Interestingness of Character is Rules[ C ]. Proceedings of the 2nd International Conference on Discovery and Data Mining, AAA I , 1996. 2632266. [6] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, "Dynamic itemset counting and implication rules for market basket analysis," in Proceedings of 1997 ACMSIGMOD International Conference on Management of Data (SIGMOD '97) (Tuscon, AZ), pp. 255-264, May 1997. [7] C.C.Aggarwal, C. Procopiuc, and P.S.Yu, "Finding localized associations in basket data," IEEE Transactions on Knowledge and Data 62, 2002. [8] A. Savasere, E. Omiecinski, and S. Navathe, "An efficient algorithm for mining association rules in large databases," in Proceedings of 1995 International Very Large Data Bases (VLDB '95) (Zurich, Switzerland), pp. Education Press, 2001. [10] Pawlak Z, Wong S KM, Ziarko W. Rough sets Probabilistic versus approach. International Journal of Man Machine Studies, deterministic 1988, 29 :81 295. [11] Ziarko W. Variable precision rough set model. Journal of Computer and Sciences, 1993, 46 :39 259. 64 System Conference on 432-443, September 1995. market Engineering, vol.14, pp. 51Acteristic Knowledge Boston: for
[9] Han J, KambrM. DATA MINING Concepts and Techniques[M], Beijing Higher