Professional Documents
Culture Documents
Dionisius Pratama
Eric Hasmiraldi, S.Pd., M.Hum.
KU 201 – English II
April 18th 2018
Introduction
When someone is faced with problems, he/she must decide what to do best to deal with the
problems. The most important thing to deal with the problems are data. The knowledge of those data is
important because with accurate and precise datas, he/she can decide what is the best thing to solve
their problems.
Suppose this someone is a manager of a factory outlet which sell several types of clothes. His
department store is visited with a lot of visitors everyday, and they each have their own personal interest
on clothes sold in his store. One day, he wants to know what type of clothes which is mostly bought by
the visitors. He intends to order more of that type of clothes and put it into his store. It will be an easy
job if the visitors are only 10 people each month. He can just make a list on a paper and do calculations
there. The problem is that his store is not visited by only 10 people, suppose it is visited by 10.000
people each day. How many papers would he need to do calculations then? This is where data mining
and statistics come to help people deal with big amount of data. To put it simply, data mining and
What is data mining? Data mining is the set of methods and techniques for exploring and
analysing data sets (which are often large), in an automatic or semi-automatic way, in order to find
among these data certain unknown or hidden rules, associations or tendencies; special systems output
the essentials of the useful information while reducing the quantity of data. Briefly, data mining is the
art of extracting information – that is, knowledge – from data.1 , and statistics is a method of collecting
and analysing data2. The size of the data is denoted by n. These are the two methods used to give
1 Stéphane Tufféry, Data Mining and Statistics for Decision Making (Wiley: 2011), page 4.
2 Jarkko Isotalo, Basics of Statistics(journal), page 1.
Pratama, Data Mining and Statistics for Decision Making 2
If the data mining is so powerful, why scientist should combine the two methods? Study this
one simple case: data mining usage in business world. The business world is growing stronger, bigger,
and also more variative in the world of today. Because its growth, the data needed by the developer
and the workers are not just quantitative data which statistics alone can give. The data they need is
much more complex, such as: consumer’s profile, their interests on other products in the present and
To get a precise and accurate information from a lot amount of data which have many types of
variables (because human behavior is not a discrete variable), data mining and statistics must be
combined. Statistics cannot give people information from analysing those kinds of data type. This is the
reason why data mining is used together with statistics. Data mining, which is build based on neural
networks, decision trees, and machine learning algortihm, will be much of help when dealing with the
complex types of variables (categorial variable, variable that contains a finite number of categories or
Statistics give the people a quantitative data and data mining methods will give people the
qualitative data. Statistics is also used because of its simplicity. People do not need complex data
mining to do quantitative analysis because statistics alone can be used to do it. Later, the data extracted
by using those two methods will be used by the business developer to make a decision on how will they
drive and manage their business, predict future trends, come up with better business strategies, and
more.
The data mining method is able to take personal information from a person. Even though it is
not used for criminal actions, there are still some people who do not want to use it. One example, taken
from the New York Times article (2015), Timothy D. Cook, the chief executive of Apple, agued that
companies are not worthy to have other people’s email or their history or the family photos data-mined
and sold off for what advertising purpose. Another, a survey by Annenberg university (The Tradeoff
Fallacy), shows 55 percent of respondents disagreed or strongly disagree to the statement “it’s O.K. if
a store where I shop uses information it has about me to create a picture of me that improves the
services they provide for me.” About 7 in 10 people also disagreed that it was fair for a store to monitor
their online activities in exchange for free Wi-Fi while at the store. 91 percent of respondents disagreed
Pratama, Data Mining and Statistics for Decision Making 3
that it was fair for companies to collect information about them without their knowledge in exchange for
a discount.
Is the data mining invention make bad influence to people’s daily life? The answer is no. Rather
than stealing personal data, data mining is used for personalized service for a person. The data
extracted by mining method and analysed by the business developers can give the develpoers a good
prediction of their consumer behaviour. With the data, they can give that person an advertisement, for
example, that matches with their personal interests, when he/she goes to the web (online shop) or even
to a store, if the store staff is told about that person interests. This is also called user behaviour analysis.
The second reason is almost everything that people do today (business process and
transactions, banking transaction, market basket analysis, and more) is stored into computer system,
and it makes the data size is now expanded to yottabyte (1024 bytes). Powerful tools for data analysing
is needing to transform those up-to 1024 bytes data into some valuable and organized knowledge for
making decision. For case example: mining usage in search engine queries. Suppose that an online
shop has a search engine built within it, and each they, it receives many queries. These queries can be
viewed as a “information exchange”, where the people describe his/her requirement for informations.
What interesting is that the developers are able to read a pattern in the queries and gain some valuable
knowledge of it by doing data mining, which cannot be seen if the developers only read individual items.
Example, the online shop puts a new category of clothes that is sold. The data mining algorithm uses
specific search terms as indicators of searching activity of the new clothes type. Closed relationship ca
be found between the number of users only doing the search and the number of users who actually
want to buy the new clothes. If all of the search queries related to the new clothes type are aggregated,
a pattern will go out. Using aggregated search data, the developer can estimate their consumer activity
Conclusion
Together with statistics, data mining method can be used to extract all information about a
person behavior. Data mining and statistics is mainly used for business process which have large
amount of data. If most of business developers out there already know these two methods, there might
be some change in their business management. After all, the final point of using data mining in their
REFERENCES
Han, Jiawei; Micheline Kamber and Jian Pei. 2012. Data Mining: Concepts and Techniques (3rd edition).
Tufféry, Stéphane. 2011. Data Mining and Statistics for Decision Making. John Wiley & Sons, Ltd.,
Publication.
ADDITIONAL REFERENCES
https://www.nytimes.com/2015/06/05/technology/consumers-conflicted-over-data-mining-policies-
Turow, Joseph; Michael Hennessy and Nora Draper. 2015. The Tradeoff Fallacy. Report.