You are on page 1of 9

A4 - Teuku Hilman Revanda - 001201500038

Rapid Miner practice (Step by step)


● Click “Add Data” and open the file .csv file

● The result of adding the data:

● Click on “Statistics” tab to see the data set that appears to be very clean
with:
o No missing values in any of the six attributes
o No inconsistent data apparent in the ranges (Min-Max) or other
descriptive statistics

● Change Views: “Design” and draw the Process:


a. Add (drag and drop) the file “HeatingOil” to the process window
b. At the Operations tab, search: “correlation matrix”, then add to the
process window
c. Connect the “out” from Retrieve HeatingOil to “exa” Correlation
Matrix,
Add the connection between “exa” from Retrieve HeatingOil to “res”
and,
Add the connection between “mat” from Retrieve HeatingOil to “res”
d. Click “Run” or F11
d

c c
a

The result is shown below:

Evaluation
Positive Correlation is when both of the attributes directly proportional to each
other. For example like Heating Oil Consumption and Insulation Rating. Whenever
the Heating Oil Consumption rise, the Insulation Rating will also rise. And vice
versa.

Negative Correlation is when both of the attributes inversely proportional to each


other. For example like Temperature and Insulation Rating. Whenever the
Temperature rise, the Insulation Rating will fall. And vice versa.

1. The most significant attribute (factor) influences (with positive correlation)


on heating oil consumption is the "Average Age of the occupants of the
house
2. The second most influential attribute (factor)
is Temperature (with negative correlation)
3. The third most influential (factor) attribute is Insulation (with
a positive correlation)
4. The Home Size attribute, the influence is very small,
whereas Num_Occupant is arguably no influence to the consumption of
heating oil

● Goes to Result of the data HeatingOil.csv (ExampleSet (//Local


Respitory/data/heating oil/HeatingOil)
1. Open tab “Charts” and select:
1.1. X-Axis : Heating_Oil
1.2. Y-Axis : Avg_Age
1.3. Color Column : Heating_Oil
2. At tab “Charts”, select:
2.1. X-Axis : Heating_Oil
2.2. Y-Axis : Temperature
2.3. Color Column : Heating_Oil

Deployment
II. Dropping the Num_Occupants attribute
a. While the number of people living in a home might logically seem
like a variable that would influence energy usage, in our model it
did not correlate in any significant way with anything else
b. Sometimes there are attributes that don’t turn out to be very
interesting

III. Adding additional attributes to the data set


a. It turned out that the number of occupants in the home didn’t
correlate much with other attributes, but that doesn’t mean that
other attributes would be equally uninteresting
b. For example, what if Sarah had access to the number of furnaces
and/or boilers in each home?
c. Home_size was slightly correlated with Heating_Oil usage, so
perhaps the number of instruments that consume heating oil in
each home would tell an interesting story, or at least add to her
insight

IV. Investigating the role of home insulation


a. The Insulation rating attribute was fairly strongly correlated with a
number of other attributes
b. There may be some opportunity there to partner with a company
that specializes in adding insulation to existing homes

V. Focusing the marketing efforts to the city with low temperature and high
average age of citizen
a. The temperature attribute was fairly strongly negative correlated
with a heating oil consumption
b. The average age attribute was strongest positive correlated with
a heating oil consumption

VI. Adding greater granularity in the data set


a. This data set has yielded some interesting results, but it’s pretty
general
b. We have used average yearly temperatures and total annual
number of heating oil units in this model
c. But we also know that temperatures fluctuate throughout the year
in most areas of the world, and thus monthly, or even weekly
measures would not only be likely to show more detailed results of
demand and usage over time, but the correlations between
attributes would probably be more interesting
d. From our model, Sarah now knows how certain attributes interact
with one another, but in the day-to-day business of doing her
job, she’ll probably want to know about usage over time periods
shorter than one year

You might also like