You are on page 1of 35

Data mining

Clustering
Dress Attribute Sales Data Set
Introduction

This dataset contain Attributes of dresses and


their recommendations according to their sales.
Sales are monitor on the basis of alternate days.
The data characteristics is using text.
The number of attributes is 13.
The number of instances is 501
The dataset contains missing values.
It is suitable for classification and clustering
Attribute Information
Style: Bohemia, brief, casual,cute,fashion,flare,novelty,OL,party,sexy,vintage,work.
Price:Low,Average,Medium,High,Very-High
Rating:1-5
Size:S,M,L,XL,Free
Season:Autumn,winter,Spring,Summer
NeckLine:O-neck,backless,board-neck,Bowneck,halter,mandarin-collor,open,peterpan-
collor,ruffled,scoop,slash-neck,square-collar,sweetheart,turndowncollar,V-neck.
SleeveLength:full,half,halfsleeves,butterfly,sleveless,short,threequarter,turndown,null
waiseline:dropped,empire,natural,princess,null.
Material:wool,cotton,mix etc
FabricType:shafoon,dobby,popline,satin,knitted,jersey,flannel,corduroy etc
Decoration:applique,beading,bow,button,cascading,crystal,draped,embroridary,feathers,
flowers etc
Pattern type: solid,animal,dot,leapard etc
Recommendation:0,1
A2. DATA Searching Data
Rating
attribute is
numerical.
The
maximum
value is 5.
The
minimum
value is 0.
The mean is
3.529.
The stdDev
is 2.005
Actual data
Remove Missing values
Missing values with filter and why
Finding the outliner
List Outlier detection- WEKA-> FILTER-
>UNSUPERVISED->ATTRIBUTE->INTERQUATILE
RANGE

Before I used
to have 14
attribute but
after
applying the
outlier, I had
two new
attributes
which are
outlier and
extreme
value.
It shows thats I have 121 instance having
outliner and 379 do not have outliner. The
extreme values does not have the outliner.
Thus it is good, since the less the better.
Remove the outliner
How to remove the outliner :-

Weka -> Filters-> unsupervised -> instance - > remove with values -
> click on filter field to adjust.
After adjusting the yes instance outliner is removed.
First I specify the index of the attribute of the outliner which is 15.
Then choose the nominal indices as last since the last value of the
outliner instances is yes.
No Extreme values
Noisy data
A3- Data preparation
Attribute construction
After Attribute construction
Adding new attribute
Normalization
A4- Data reduction
Resampling
SRSWithoutR
SRSwithR with
sample size percent = 50
Evaluate 3 different number of clusters by
investigating the errors(says, k = {3,4,5}).

Number of cluster = 3
Number of cluster = 4
Number of cluster =5
Visualize the several number of results
based on different number of clusters.

K=3
k=4
k=5

You might also like