Data Mining 2018

NAME : PARDON MHIZHA
STUDENT I.D. : 129577
COURSE TITLE : DATA MINING AND WAREHOUSING
COURSE CODE : 406
LECTURER : MR. F. CHOGA

1. Explain five major issues in Data Mining. (25 marks)
Mining methodology and user interaction
This involves mining different kinds of knowledge from different sources of data or databases.
Since different users can be interested in different kinds of knowledge, data mining should cover
a wide spectrum of data analysis and knowledge discovery tasks, such as data characterization,
discrimination, association and correlation analysis, classification, prediction, clustering, outlier
analysis, and evolution analysis.
There is Interactive mining of knowledge at multiple levels of abstraction which allows users to
focus the search for patterns, providing and refining data mining requests based on returned
results. And therefore knowledge should be mined by drilling down, rolling up, and pivoting
through the data space and knowledge space interactively.
There is also incorporation of background knowledge that may be used to guide the discovery
process and allow discovered patterns to be expressed in concise terms at different levels of
abstraction with domain knowledge helping to focus and speed up a data mining process, or
judge the interestingness of discovered patterns.
Data mining query languages and ad-hoc data mining such as Relational query languages allow
users to pose ad hoc queries for data retrieval. High-level data mining query languages need to
be developed to allow users to describe ad hoc data mining tasks and such a language should be
integrated with a database or data warehouse query language and optimized for efficient and
flexible data mining.
Discovered knowledge should be expressed in high-level languages, visual representations, or

other expressive forms so that the knowledge can be easily understood and directly usable by
humans and also the system must adopt expressive knowledge representation techniques, such as
trees, tables, rules, graphs, charts, crosstabs, matrices, and or curves.
The data stored in a database may reflect noise, exceptional cases, or incomplete data objects as
such these may confuse the process. As a result, the accuracy of the discovered patterns can be
poor. Therefore there is need for data cleaning methods and data analysis methods that can
handle noise. This would enable uncover thousands of patterns many of which may be
uninteresting to the given user, either because they represent common knowledge or lack
novelty.
Performance and scalability
Efficiency and scalability of data mining algorithms may effectively extract information from a
huge amount of data in databases hence data mining algorithms must be efficient and scalable. In
other words, the running time of a data mining algorithm must be predictable and acceptable in
large databases. The huge size of many databases, the wide distribution of data, the high cost of
some data mining processes and the computational complexity of some data mining methods are
factors motivating the development of parallel and distributed data mining algorithms and such
algorithms may divide the data into partitions, which are processed in parallel and he results
from the partitions are then merged. They incorporate database updates without having to mine
the entire data again “from scratch.
Issues relating to the diversity of data types
This involves handling relational and complex types of data and it is unrealistic to expect one
system to mine all kinds of data, given the diversity of data types and different goals of data
mining. Specific data mining systems should be constructed for mining specific kinds of data,
therefore, one may expect to have different data mining systems for different kinds of data.
Data mining may help disclose high-level data regularities in multiple heterogeneous databases
that are unlikely to be discovered by simple query systems and may improve information
exchange and interoperability in heterogeneous databases. Web mining, uncovers interesting
knowledge about Web contents, Web structures, Web usage, and Web dynamics.
Issues related to applications and social impacts
In an ethical sense, database security is related to privacy. This is because database security
inhibits the unauthorized dissemination of personal data thus further enhancing, albeit indirectly,
an individual's capacity to regulate access to their data.
When data can be viewed from many different angles and at different abstraction levels, it
threatens the goal of protecting data security and guarding against the invasion of privacy. It is
important to study when knowledge discovery may lead to an invasion of privacy, and what
security measures can be developed for preventing the disclosure of sensitive information.
By controlling access to the data and preventing users from obtaining a sufficient amount of
data, consequent mining will not result in high confidence levels. This also includes query
restriction, which attempts to detect when statistical compromise might be possible through the
combination of queries.
1. Select one data mining application domain and explain how data mining methods
should be developed.(25 marks)
Data mining, the extraction of hidden predictive information from large databases, is a powerful
new technology with great potential to help companies focus on the most important information
in their data warehouses. Data mining tools predict future trends and behaviors, allowing
businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses
offered by data mining move beyond the analyses of past events provided by retrospective tools
typical of decision support systems. Data mining tools can answer business questions that
traditionally were time consuming to resolve. They scour databases for hidden patterns, finding
predictive information that experts may miss because it lies outside their expectations.
Data mining has been used intensively and extensively by many companies or organizations in
different parts of the world. In healthcare, data mining is becoming increasingly popular and
increasingly essential. Data mining applications can greatly benefit all parties involved in the
healthcare industry. For instance, data mining can help healthcare insurers detect fraud and
abuse, healthcare organizations make customer relationship management decisions, physicians
identify effective treatments and best practices, and patients receive better and more affordable
healthcare services.
Several factors have motivated the use of data mining applications in healthcare. The existence
of medical insurance fraud and abuse, for example, has led many healthcare insurers to attempt
to Data Mining Applications in Healthcare.
The huge amounts of data generated by healthcare transactions are too complex and voluminous
to be processed and analyzed by traditional methods. Data mining provides the methodology and
technology to transform these mounds of data into useful information for decision making.
In particular in healthcare data mining deals with the evaluation of treatment effectiveness,
management of healthcare, customer relationship management, and the detection of fraud and
abuse. It also gives an illustrative example of a healthcare data mining application involving the
identification of risk factors associated with the onset of various diseases.
Health-care reduces their losses by using data mining tools to help them find and track offenders.
Fraud detection using data mining applications is prevalent in the commercial world, like in the
detection of fraudulent credit card transactions. Another factor is that the huge amounts of data
generated by healthcare transactions are too complex and voluminous to be processed and
analyzed by traditional methods. Data mining can improve decision-making by discovering
previously hidden patterns and trends in large amounts of complex data. Those analysis has
become has more vital as financial pressures have heightened the need for healthcare
organizations to make decisions based on the analysis of clinical and financial data. Insights
gained from data mining can influence cost, revenue, and operating efficiency while maintaining
a high level of care.
Healthcare organizations that perform data mining are better positioned to meet their long-term
needs and data can be a great asset to healthcare organizations, but first have to be transformed
into information. Yet another factor motivating the use of data mining applications in healthcare
is the realization that data mining can generate information that is very useful to all parties
involved in the healthcare industry. For example, data mining applications can help healthcare
insurers detect fraud and abuse, and healthcare providers can gain assistance in making
decisions, for instance, in customer relationship management. Data mining applications also can
benefit healthcare providers, such as hospitals, clinics and physicians, and patient by identifying
effective treatments and best practices.
Data mining applications can also be developed to evaluate the effectiveness of medical
treatments. By comparing and contrasting causes, symptoms, and courses of treatments data
mining can deliver an analysis of which courses of action prove effective and the outcomes of
patient groups treated with different drug regimens for the same disease or condition can be
compared to determine which treatments work best and are most cost-effective.
In healthcare data can also be mined and by closely looking at their treatment record data they
can explore ways to cut costs and deliver better medicine. From this data it can also develop
clinical profiles to give other doctors and physicians information about their practice patterns and
to compare these with those of other healthcare centers.
While customer relationship management is a core approach in managing interactions between

commercial organizations—typically banks and retailers—and their customers, it is no less
important in a healthcare context. Customer interactions may occur through call centers,
physicians’ offices, billing departments, inpatient settings, and ambulatory care settings. As in
the case of commercial organizations, data mining applications can be developed in the
healthcare industry to determine the preferences, usage patterns, and current and future needs of
individuals to improve their level of satisfaction. These applications also can be used to predict
other products that a healthcare customer is likely to purchase, whether a patient is likely to
comply with prescribed treatment or whether preventive care is likely to produce a significant
reduction in future utilization.
To aid healthcare management, data mining applications can be developed to better identify and
track chronic disease states and high-risk patients, design appropriate interventions, and reduce
the number of hospital admissions and claims. Healthcare organizations can puts its patient
stratus or clusters populations by demographic characteristics and medical conditions to
determine which groups use the most resources, enabling it to develop programs to help educate
these populations and prevent or manage their conditions and also to give better healthcare at
lower costs. Data mining can also be used to decrease patient length-of-stay, thus avoiding
clinical complications, develop best practices, improve patient outcomes, and provide
information to physicians to improve the quality of healthcare.
Data mining applications that attempt to detect fraud and abuse often establish norms and then
identify unusual or abnormal patterns of claims by physicians, laboratories, clinics, or others.
Among other things, these applications can highlight inappropriate prescriptions or referrals and
fraudulent insurance and medical claims.
However as far as much as data mining is concerned in Healthcare it has its limitations. Data
mining can be limited by the accessibility of data, because the raw inputs for data mining often
exist in different settings and systems, such as administration, clinics, laboratories and more.
Hence, the data have to be collected and integrated before data mining can be done. That’s why
it was suggested that a data warehouse be built before data mining is attempted, that can be a
costly and time-consuming project.
Secondly, other data problems may arise which include missing, corrupted, inconsistent, or non-
standardized data, such as pieces of information recorded in different formats in different data
sources. In particular, the lack of a standard clinical vocabulary is a serious hindrance to data
mining. Data problems in healthcare are the result of the volume, complexity and heterogeneity
of medical data and their poor mathematical characterization and non-canonical form. Further,
there may be ethical, legal and social issues, such as data ownership and privacy issues, related
to healthcare data. The quality of data mining results and applications depends on the quality of
data. A sufficiently exhaustive mining of data will certainly yield patterns of some kind that are a
product of random fluctuations especially for large data sets with many variables.
The successful application of data mining requires knowledge of the domain area as well as in
data mining methodology and tool because without a sufficient knowledge of data mining, the
user may not be able to avoid the pitfalls of data mining. The data mining team should possess
domain knowledge, statistical and research expertise, and IT and data mining knowledge and
skills.
Finally, healthcare organizations developing data mining applications also must make a
substantial investment of resources, particularly time, effort, and money. Data mining projects
can fail for a variety of reasons, such as lack of management support, unrealistic user
expectations, poor project management and inadequate data mining expertise. Data mining
requires intensive planning and technological preparation work. In addition, physicians and
executives have to be convinced of the usefulness of data mining and be willing to change work
processes.
BIBLIOGRAPHY
Edelstein, H., A. (1999). Introduction to data mining and knowledge discovery (3rd ed).
Potomac, MD: Two Crows Corp.
Han, J., Kamber, M. (2000). Data mining: Concepts and Techniques. New York: Morgan-
Kaufman.

Data Mining 2018

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining 2018

Uploaded by

Copyright:

Available Formats

NAME : PARDON MHIZHA

STUDENT I.D. : 129577

COURSE TITLE : DATA MINING AND WAREHOUSING

COURSE CODE : 406

LECTURER : MR. F. CHOGA

Mining methodology and user interaction

Discovered knowledge should be expressed in high-level languages, visual representations, or

Performance and scalability

Issues relating to the diversity of data types

While customer relationship management is a core approach in managing interactions between

You might also like