Most CE802 students use the Weka package as the basis of their assignments. This is because it not only provides implementations of a ide range of learning procedures but also includes the machinery for running systematic e!periments and reporting relevant statistics for the results. "n other ords# it ill do a lot of the ork for you. These e!ercises serve to purposes$ They enable you to discover hat facilities Weka provides and ho to use them. They allo you to see some of the learning procedures that e discuss in the lectures in action. Obtaining Weka "mplementations of Weka for a ide variety of machines%operating systems can be donloaded from the Weka ebsite & http$%%.cs.aikato.ac.n'%ml%eka%inde!.html (. )arious versions of Weka are on offer* you almost certainly ant the stable version hich is currently Weka +.,. -ince Weka is ritten in .ava it re/uires the .ava virtual machine. Choose the appropriate donload option if you do not already have this on your computer. The code comes as a self0e!tracting e!ecutable file &eka0+0,08.e!e( so installation is very simple indeed. Running Weka 1ssuming you do not override the defaults during installation# Weka ill be located in a folder called Weka0+., in the 2rogram 3iles folder. The main program can be launched via a short cut or by clicking on a file called either eka.e!e or eka.4ar &there are minor differences beteen different versions(. 5nce launched# a small indo ill appear# usually in the top right of your screen# through hich you chose the interface you ant to use. The E!plorer is the most useful for most CE802 assignments. Clicking on the button ill launch the E!plorer interface. 6 The Explorer Interface This is probably the most confusing part of becoming familiar ith Weka because you are presented ith /uite a comple! screen. "nitially 7preprocess8 ill have been selected. This is the tab you select hen you ant to tell Weka here to find the data set that you ant to use. Weka processes data sets that are in its on 1933 format. Conveniently# the donload ill have set up a folder ithin the Weka0+., folder called 7data8. This contains a selection of data files in 1933 format. 2 ARFF forat files :ou do not need to kno about 1933 format unless you ish to convert data from other formats. ;oever# it is useful to see the information that such files provide to Weka. The folloing is an e!ample of an 1933 file for a dataset similar to the one used in the decision tree lecture$ @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} @attribute windy {TRUE, !"#E} @attribute play {yes, no} @data sunny,hot,high,!"#E,no sunny,hot,high,TRUE,no overcast,hot,high,!"#E,yes rainy,mild,high,!"#E,yes rainy,cool,normal,!"#E,yes rainy,cool,normal,TRUE,no overcast,cool,normal,TRUE,yes sunny,mild,high,!"#E,no sunny,cool,normal,!"#E,yes rainy,mild,normal,!"#E,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes overcast,hot,normal,!"#E,yes rainy,mild,high,TRUE,no "t consists of three parts. The <relation line gives the dataset a name for use ithin Weka. The <attribute lines declare the attributes of the e!amples in the data set &=ote that this ill include the classification attribute(. Each line specifies an attribute>s name and the values it may take. "n this e!ample the attributes have nominal values so these are listed e!plicitly. "n other cases attributes might take numbers as values and in such cases this ould be indicated as in the folloing e!ample$ @attribute temperature numeric The remainder of the file lists the actual e!amples# in comma separated format* the attribute values appear in the order in hich they are declared above. Opening a data set! "n the E!plorer indo# click on 75pen file8 and then use the broser to navigate to the ?data> folder ithin the Weka0+., folder. -elect the file called eather.nominal.arff. &This is in fact the file listed above(. This is a ?toy> data set# like the ones used in class for demonstration purposes. "n this case# the normal usage is to learn to predict the ?play> attribute from four others providing information about the eather. + The E!plorer indo should no look like this$ Most of the information it displays is self0e!planatory$ it is a data set containing 6@ e!amples &instances( each of hich has A attributes. The ?play> attribute has been suggested as the class attribute &i.e. the one that ill be predicted from the others(. Most of the right hand of the indo gives you information about the attributes. "nitially# it ill give you information about the first attribute &?outlook>(. This shos that it has + possible values tells you ho many there are of each value. The bar chart in the loer right shos ho the values of the suggested class variable are distributed across the possible values of the ?outlook>. "f you click on ?temperature> in the panel on the left# the information about the ?outlook> attribute ill be replaced by the corresponding information about the temperature attribute. "hoosing a classifier =e!t e must select a machine learning procedure to apply to this data. The task is classification so click on the ?classify> tab near the top of the E!plorer indo. @ The indo should no look like this$ By default# a classifier called Cero9 has been selected. We ant a different classifier so click on the Choose button. 1 hierarchical pop up menu appears. Click to e!pand ?Trees># hich appears at the end of this menu# then select .@8 hich is the decision tree program e ant. A The E!plorer indo no looks like this indicating that .@8 has been chosen. The other information alongside .@8 indicates the parameters that have been chosen for the program. 3or this e!ercise e ill ignore these. "hoosing the experiental procedures The panel headed ?Test options> allos the user to choose the e!perimental procedure. We shall have more to say about this later in the course. 3or the present e!ercise click on ?Dse training set>. &This ill simply build a tree using all the e!amples in the data set(. The small panel half ay don the left hand side indicates hich attribute ill be used as the classification attribute. "t ill currently be set to ?play>. &=ote that this is hat actually determines the classification attribute E the ?class> attribute on the pre0process screen is simply to allo you to see ho a variable appears to depend on the values of other attributes(. , Running the decision tree progra =o# simply click the start button and the program ill run. The results ill appear in the scrollable panel on the right of the E!plorer indo. =ormally these ill be of great interest but for present purposes all e need to notice is that the resulting tree classified all 6@ training e!amples correctly. The tree constructed is presented in indented format# a common method for large trees$ $%& pruned tree '''''''''''''''''' outlook ( sunny ) humidity ( high* no +,.-. ) humidity ( normal* yes +/.-. outlook ( overcast* yes +%.-. outlook ( rainy ) windy ( TRUE* no +/.-. ) windy ( !"#E* yes +,.-. 0umber o1 "eaves * 2 #i3e o1 the tree * & The panel on the loer left headed ?9esult list &right0click for options(> provides access to more information about the results. 9ight clicking ill produce a menu from hich ?)isuali'e Tree> can be selected. This ill display the decision tree in a more attractive format$ =ote that this form of display is really only suitable for small trees. Comparing the to forms should make it clear ho the indented format orks. F
Microsoft Access Guide to Success: From Fundamentals to Mastery in Crafting Databases, Optimizing Tasks, & Making Unparalleled Impressions [III EDITION]