You are on page 1of 6

International Journal of Advanced Engineering Research and Technology (IJAERT) 13

Volume 5 Issue 1, January 2017, ISSN No.: 2348 8190

Self-medical analysis using internet-based computing upon big data


Mrs. B. Meena Preethi1, Mr.R.RajKumar2, Ms. N. Sasi Revathi3, Ms. R. Akshaya4
Assistant Professor1,2, III M.Sc. SS Students 3,4
Department of BCA & M.Sc. SS
Sri Krishna Arts and Science College, Kuniamuthur, Coimbatore-8
meenapreethib@skasc.ac.in, sasirevathin14mss046@skasc.ac.in, akshayar14mss006@skasc.ac.in

ABSTRACT
Today it has become a necessary thing for the people to stage. And it is also easy to produce self-attend service
analyze their health themselves under urgent situations of due to huge and daily increase of medical and diagnosis
the modern world. The people can have self medical data. A big data issue arises here due to increased
analysis by themselves by comparing the similar patients amount of large amount of data in various formats. Due to
record. This can be done through big data where similar the vast features available in internet-based computing
past records of patients history are available. Self-attend over big data area.
service using big data paves way to challenges including
highly stable and flexible medical record retrieval, data 2. Preliminary definitions
study as well as the protection of confidential information
of the patient. Here in this paper, we propose an Internet- 2.1 Medical Record
based computing structure for implementing a self-attend
service named Self-Medical Analysis to address the above Basically, a medical record which is produced
mentioned challenges. To support highly stable and electronically, has three tuple (i.e.,) EMR (Electronic
flexible medical record retrieval, data study and privacy Medical Record)[6][7] which is patient record, patient
protection a Lucene based distributed search array is profile and clinical data.
implemented. Moreover, to increase medical record
retrieval a hadoop array is implemented for the storage of 2.1.1 Patient record
data in offline and index building. The implementation of
self medical analysis is discussed, where similar data of It generally contains the general information about
previous medical records and the structure of symptoms of the person like name, age, gender, etc.
the disease are obtained, so that it will be useful for the
user to identify the exact type of disease they are affected 2.1.2 Patient profile
with. At last a structure is implemented to illustrate the
flexibility and effectiveness of our proposal. It contains the previous medical history of the
patient about the medical data of previous surgery,
Keywords: DCN, Hadoop, Internetbased computing, disease, transfusion and as well as allergic history.
Lucene, self-attend service;
2.1.3 Clinical data
1. Introduction
It contains detailed information symptom,
Why Internet-based computing is used upon big data? patient complaint, present history, analysis result,
treatments and so on which are related with a
A strategy taken by World Health Organization patients each visit to a health-care analyzer.
addresses that there are 75% of people[1] among world
population who are in Suboptimal Health Status (SHS), 2.2 Hadoop
also known as the third state (i.e.,) neither sick nor
healthy. Among them an extensive part of people would Hadoop[8][9] is an open source software framework
take notice of preventing the disease and civilize which is an Internet-based computing is used for
themselves about the disease with the data in the previous operating applications on commodity hardware. It finds a
medical records of the patient[2][3][4]. To fulfill the demands creative way of storing and processing information. It
of SHS groups, self-attend service should be designed to consists of two main components:
create knowledge on precautions of diseases at the earlier

www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 14
Volume 5 Issue 1, January 2017, ISSN No.: 2348 8190

2.2.1 HDFS library. The important component of it are indexing and


finding. To produce fast online searching, indexing and
HDFS (Hadoop Distributed File System) stores finding are responsible and this method is implemented in
files in a array and large files are separated into blocks this paper for self-medical analysis. At first, the records
and every blocks are written into multiple nodes. are converted into many documents along with standard
APIs. Then the documents are stored in HDFS as block
2.2.2 MapReduce files. Map Reduce tasks are initiated to build index files
for each document to, facilitate simultaneous online
It is parallel data processing model, which consists medical record finding. After the completion of indexing
of two parts: Map and Reduce. The task of Map is to phase the index files are stored in HDFS.
process the input records that are not dependent of
each other and produces key-value pair as an 2.4 Data centre network (DCN)
intermediate result. The results obtained in the middle
of the process are placed on the local disk of the node Data center is a small placed of resources
operating the Map task. At the time when all the Map (computational,storage, network) interconnected to each
tasks are finished the Reduce phase starts the process other using a communication network[10]. Data Center
in which the intermediate data with the same key is Network (DCN) holds a important role in a data center, as
aggregated. it connects all of the data center resources together. DCNs
need to be extensible and productive to connect tens or
` even hundreds of thousands of servers to handle the
MapReduce Other commands growing demands of internet computing. Todays data
centers are unnatural by the interconnection network

3. Related Articles

HADOOP YARN To give a visual about a whole-big-data


visualization approach that contains the information about
array structure, a CloudVista prototype system is
discussed in . To enable speed, massive data store over big
HDFS data at a minimum cost a set of Internet-based hardware,
software and structure are opted in . To help the data
scientist to perform evolutionary experiments from
Fig.1. Hadoop Architecture elementary data, planned a Internet-based model named
prism. The issue of emerging of huge mass of data in the
Hadoop YARN is a clear-cut element of the open field of medicine is called as big medical data is used to
source Hadoop platform for big data analytics, certified by create, check and manage data through web. A public
the charitable foundation called Apache software oriented platform called as Pharmacien inspecteur de sant
foundation. publique (PHISP) is discussed ). The Expert systems are
constructed to solve the highly-tasked problems. To
It includes a important element called a central protect the data of patient, the author says the flexibility
library system, a HDFS file handling system, and issues over large-scale data-mining through a sub-tree
MapReduce, both acts as data handling source. In scheme.
extension to these, theres YARN, which is particularized
as a array dais that helps to maintain resources and assign 4. Proposed System
works. The Apache software foundation, points YARN as
'next-generation or ' 2.0. Inspired by these presentations, in this paper, we
address these challenges through the following
2.3 Lucene contributions:
In this paper we have proposed a Internet-based
It is an open-source project licensed by Apache framework for implementing a self-attending service
Software Foundation[11][12]. It provides maximum named Self-analysis. Concretely, a distributed Lucene-
performance and efficient Information Retrieval (IR) based search array is constructed to provide highly

www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 15
Volume 5 Issue 1, January 2017, ISSN No.: 2348 8190

concurrent and flexible online retrieval of previous The therapeutic records are mostly stored in XML file
medical records, data analysis and privacy protection systems and mostly in related records of database
functions. To increase the retrieval of medical record, a management system. By lucene structure, the therapeutic
Hadoop cluster is adopted for offline data storage and records are transferred into the lucene data files. On
index building. applying the operations such as join and dump the
structured therapeutic records are transformed into the
Particularly, the implementation of the Self- Lucene records.
analysis service consists of four steps. Firstly, a user
submits a query associated to their disease information. 5.3 Online distributed search array
Then medical records which matched the users disease
symptoms, gender and age are retrieved in Step 2. With The major components of this online distributed
retrieved medical records, data analysis is processed in search array are load balancer, messenger array and search
Step 3, to compute a disease-symptom structure, which node array, data study array and access control array.
matches the relations among diseases with common
symptoms. At last, private information in the medical 5.3.1 Load balancer
records is filtered according to an access control policy.
Therefore, the disease-symptom structure, as well as The load balancer is a hardware interface for self
medical records with private information about the user is attending service. Receiving the query from the user the
returned to users, which provide a detailed analysis basis load balancer transfers the query to the messenger by
for users to have a primary analysis by themselves. selecting its rules. The selecting rule is usually dependent
on the user hardware.
5. Self-attend service with structured array
5.3.2 Messenger array
In this paper, a arrayed structure is put to
appliance for self-attend service. The disease is The messenger array is responsible for both data study
determined by the users through a sign of illness. The array and the access control array. Receiving the user
references for self-attend service are provided with related query form the load balancer the messenger will set a node
medical records. In our daily life self-attend services has to search the data from therapeutic records. When the
become more and more substantial, particularly under the therapeutic records are returned then it gets merged with
urgent situation of global ageing. the data study node from the data study array to build a
sign of illness disease lattice. Before returning the records
5.1 Outline of self-attend service with structured array the sign of illness disease lattice and the therapeutic
records are accessed to get the filtered results of secured
The structured array consists of two main array they data of therapeutic records.
are online distributed search array and offline Hadoop
array. 5.3.3 Search node array
The online distributed search array is designed to
transform the end users query in a highly coexisting and The search node array is generally elastic and scalable.
scalable order. The load balancer are included from the
online distributed search array to search the nodes that are 5.3.4 Data study array
used for obtaining therapeutic records. The data study
array is designed to create a sign of illness lattice. An From the given end-users query, the set of sign of
access control array is used to filter the results of secrecy illness is incomplete due to the end-user imperfection
data that are obtained from the user. The load balancer and knowledge on therapeutic data. Relating to the therapeutic
the messenger balance the end-users query. records, the disease category and the sign of illness is
An offline Hadoop array is used for storing the data extracted.
that are retrieved from the user and the index building is
also used. The HDFS is used for storing the data and the 5.3.5 Access control array
MapReduce is used for index building.
The end-user uses the access control array to filter the
5.2 Shared data storage illustrative secured data from the therapeutic records that are given by
the user.

www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 16
Volume 5 Issue 1, January 2017, ISSN No.: 2348 8190

6. Practice of self medical analysis

In this region the discussion about the practice of self


medical analysis through Internet-based computing is
done. The track includes the compliance of query,
obtaining therapeutic record, data study and filtering the
results of secrecy data.
6.1 Compliance of query

The end user submits the user query that blend with
his or her illness data. Receiving the query related with the
set of sign of illness, the server will forward the query to a
secured messenger from the end user.

6.2 Obtaining therapeutic record

The messenger selects a set of searched nodes that Fig.2. Framework of the prototype design
are similar to the therapeutic records submitted by the
In the above mentioned diagram, the hadoop array
user.
consists of eighteen nodes. Each node is built with two
Intel(R) Quad Core E5620 Xeon(R) processor each at 2.4
6.3 Data study
GHs and 24 GB RAM. A 2TB disk is seated. The array is
From each search node on receiving the related processed under Redhat Enterprise Linux Server 6.0, Java
therapeutic records the messenger will match the searched 1.6.0 and Hadoop-0.20.205.0[15]. At the same time, 21 PCs
results and forward the results to the cloud data study for operating under Ubuntu 9.0, Java 1.6.0 and Lucene-4.5.1
data analyzing[12][13][14]. The node on data study will are constructed for query inquiry dealing of users in
generate a sign of illness for disease lattice related with the online, in search array. A 3*6 search matrix can be
therapeutic records that are retrieved. performed by search node array consisting of 18 PCs. The
6.4 Filtering the results of secrecy data other 3 PCs implements the performance of a messenger
array, a data analysis array and a access control array. A 2
The secured data of the patient in the therapeutic Intel E5400 2.69 GHz processors, and 2GB RAM is
records are filtered according to the control of the target built inside each P .All performance are made using Java
user. and for index building and for query processing of the
online user, standard Hadoop MapReduce API and Lucene
7. Estimation API is implemented, accordingly.

Here, to explain the flexibility and adaptability of our 7.2 A real-time example of self-attend service
scenario, a prototype system is modeled and the real-time
example is presented. The flexibility of the Lucene-based A real-time example is shown to example how a
distributed search array is estimated with some self-attend service is provided for the user. Consider Jack
experiments, explicitly. An example is explained for a is ill and he has the symptoms of hyperpnea and tickle
clear understanding about how can a self-attend service in throat. So, he tries to get some knowledge about his
can be done effectively. disease to know in which department of the hospital he
should make an appointment in advance. Specifically, the
7.1 Prototype Model self-attend service falls under the following steps:

At present, for self-attend service we have used internet- Step-1 Compliance of query
based structure. For offline storage of data and index building, a
separate and intimate Hadoop array is adopted and twenty-one
First and foremost, he visits the homepage of the
PCs are built into a Lucene-based distributed search array in the
Internet-based structure, for introducing the performance of the
service to produce all the information of his disease. To
query processing made by the online users. provide necessary information for the users to access the
self-attend service, the list of normal disease-symptoms

www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 17
Volume 5 Issue 1, January 2017, ISSN No.: 2348 8190

are listed. For general information of the user Jack adds


information like male, adult and for symptoms of BF signature {CR IDs} {symptom}
disease he chooses data like tickle in throat,
hyperpnea. Thus, information given by Jack can be 0101110010 0101110010 {20120**, ...} {}
represented as
= ({male, adult},{hyperpnea, tickle in throat}).
TABLE II. Data structure of profile index
Step-2 Obtaining therapeutic record
CR IDs gender age
To get the data in the medical records whose 20120** male 26
symptoms information clearly matched with Jacks disease
symptoms, the submitted information are kept, the search
node selected by the messenger node will match the
information in the inverted and the profile index file. The TABLE III. Data structure of detail index
important note is that according to retrieval result, since Gender Analys
CR Age Sympt- Treatment
Jack is male and adult the similarly matched information IDs is oms
of female, child and elderly are ignored. result
Fever, Interferons
Step-3 Data study 2120
Male 26
Pneum hyperpnea
,tickle in
injection,
dextromethorp
* o-nia
throat han
By using the data analysis node data analysis is
conducted to design the structure of the disease with the
In the above presented tables, TABLE represents the
same symptoms which has the common relation among
previous previous medical records of other person.
the disease hyperpnea and tickle in throat.
TABLE II contains the users record. Using Internet-based
computing technique the records of the patients are stored
Step-4 Filtering the results of secrecy data
and are compared. When the record with matching
disease-symptom is obtained the correct disease for the
To maintain privacy about the patients information
symptom-lattice presented is obtained.
such as name, age, address etc., they are filtered in
accordance with Jacks authority.
Thus, the use of Internet-based technique provides
storage of data in flexible and secure manner than that of
7.3 Monitoring disease-symptom structure
using big data.
With this symptom structure if Jack could not identify
the disease he is affected with, Jack goes for other 9. Conclusion
information about the symptoms of the disease he is
affected with. To point it out correctly, Jack goes through Here, we have planned a Internet-based structure for a
the medical records associated with the symptoms. Going self-attending service called, the Self-analysis. By using
through this information Jack can ignore the impossible this service the users can get the details about the analysis
disease and can make a correct identification of his disease with the previous data at home. More than this, the
diseasesymptom structure with identical previous
medical data provides a detailed analysis, it also provides
8. Comparitive study the users the knowledge about the precautions to be
undertaken for the disease by using the Self-analysis
Now-a-days big data is used in many areas including
method.
the field of medicine, aeronautics, biology and even for
commerce. But a great impact that challenges big data is
At present, a prototype of the internet-based structure
the data storage and the effective knowledge mining.
is introduced with a limited number of records of medical
Since Internet-based computing provides advantaged such
data from the department of Respiratory Medicine. In
as elastic storage, computing power and pervasive-service
future, to increase the flexibility of the service that is
oriented nature, Internet-based computing mechanics has
provided, the users along with physicians are to be
been used in big data applications.
included to platform.
.
TABLE I. Data structure of inverted index

www.ijaert.org
International Journal of Advanced Engineering Research and Technology (IJAERT) 18
Volume 5 Issue 1, January 2017, ISSN No.: 2348 8190

References
[9] Bahga, A., Madisetti, V.K., 2012. Analyzing massive
[1]He, C., Fan, X., Li, Y., 2013. Toward ubiquitous machine maintenance data in a
healthcare services with a novel efficient cloud platform. computing cloud. IEEE Trans. Parallel Distributed Syst.
IEEE Trans. Biomed. Eng. 60 (1) , 230234. 23 (10), 18311843.

[2] Rashidi, P., Cook, 2009 Keeping the resident in the [10] K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A.
loop: adapting the smart home Madani, N. Min-Allah, L. Wang, D. Chen, M. Iqbal, C.-Z.
to the user. IEEE Trans. Syst. Man Cybern. Part A: Syst. Xu, and A. Y. Zomaya, "Quantitative Comparisons of the
Hum. 39 (5), 949959. State of the Art Data Center Architectures," Concurrency
and Computation: Practice and Experience, vol. 25, no.
[3] Cook, D.J., Youngblood, M., Heierman III, E.O., 12, pp. 1771-1783, 2013.
Gopalratnam, K., Rao, S., Litvin, A.,
Khawaja, F., 2003. Mavhome an agent-based smart home. [11] Ochoa, X., Duval, E., 2008. Relevance ranking
In: Proceedings of the metrics for learning objects. IEEE Trans. Learn. Technol.
First IEEE International Conference on Pervasive 1 (1), 3448.
Computing and Communications
(PerCom 2003). IEEE, pp. 521524. [12] Hatcher, E., Gospodnetic, O., McCandless, M., 2004.
Lucene in Action. Manning Publications, Greenwich, CT.
[4] Doctor, F., Hagras, H., Callaghan, V., 2005. A fuzzy
embedded agent-based approach for [13] Belohlavek, R., Vychodil, V., 2009. Formal concept
realizing ambient intelligence in intelligent inhabited analysis with background knowledge:
environments. IEEE Trans. attribute priorities. IEEE Trans. Syst. Man Cybern. Part C:
Syst. Man Cybern. Part A: Syst. Hum. 35 (1), 5565. Appl. Rev. 39 (4),399409.

[5] Chaudhuri, S., 2012. What next?.: a half-dozen data [14] Wu,W., Leung, Y.,Mi, J., 2009. Granular computing
management research goals for and knowledge reduction in formal
big data and the cloud. In: Proceedings of the 31st contexts. IEEE Trans. Knowledge Data Eng. 21 (10),
Symposium on Principles of 14611474.
Database Systems. ACM, pp. 14.
[15] Crampes,M., Oliveira-Kumar, J.D., Ranwez, S.,
[6] Zhang , Z., Wang, B., Ahmed, F., Zhao, R., Viccellio, Villerd, J., 2009. Visualizing social photos on a Hasse
A., Mueller, K., 2013. The fiveWS for diagram for eliciting relations and indexing new photos.
Information visualization with application to healthcare IEEE Trans.
informatics. IEEE Trans. Visual. Comput. Graph. 15 (6), 985992.
Visual Comput Graph. 19 (11), 18951910.
[16]Apache Hadoop, http://hadoop.apache.org (accessed
[7] Li, W., Yan, J., Yan, Y., Zhang, J., 2010. Xbase cloud- 03.03.14).
enabled information appliance
for healthcare. In: Proceedings of the 13th International
Conference on Extending
Database Technology (EDBT10). ACM, pp. 675680.

[8] Ekanayake, J., Gunarathne, T., Qiu, J., 2011. Cloud


technologies for bio-informatics applications.
IEEE Trans. Parallel Distributed Syst. 22 (6), 9981011.

www.ijaert.org

You might also like