HQR Framework Optimization For Predicting Patient Treatment Time in Big Data

IDL - International Digital Library Of
Technology & Research

Volume 1, Issue 6, June 2017 Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
HQR Framework optimization for predicting

patient treatment time in big data
1
PRATEEKSHA S KULKARNI
Co-Guide : Shanthi M B
1
Computer Science and Engineering, CMRIT Bengaluru
Email: 1kulkarniprateeksha51@gmail.com
Contact Number: +91-8553926003
Abstract: Today most of the hospital face overcrowded with patients long queues for different tasks. Hospital management
face difficulty to handle these patients to provide optimal treatment time for each patients waiting in the long queue.
Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the
frustration endured by patients.It would be convenient and preferable if the patients could receive the most efficient
treatment plan and know the predicted waiting time updates in real time. Because of the large-scale, realistic data-set and the
requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response.
Extensive experimentation and simulation results demonstrate the effectiveness and applicability of the proposed model to
recommend an effective and convenient treatment plan for patients to minimize their wait times in hospitals.
Keywords: Apache-spark, Hospital queuing recommendation, Big Data, Cloud Computing, Patient treatment time
prediction, Classification and regression tree.
The main focus in this thesis is to help

1. INTRODUCTION patients to complete their treatment tasks in a
predictable and optimal time and making the
Today most of the hospitals are overcrowded with hospitals to schedule each treatment task queue to
long queue of the patients and have ineffective avoid overcrowded and ineffective queues of the
management of patient queue. Managing the patients patients who opt for a hospital for their treatment. We
queues and predicting their waiting time is use training data from different hospitals to develop a
complicated and difficult job. As each patient who
comes for any checkup or any other task might
require to perform different tasks/operations, such as patient treatment time model for the on an average
checkup and Various tests, for example: blood test, maximum/optimal time required for their treatment.
X-rays or a CT scan, payment history, or MR scan,
etc during treatment of the patients. We consider each So to analyze the above context we have retrieve the
task of these tasks as treatment tasks or tasks to be patient data which are gathered from different
performed by individual patient. A patient in the hospitals by considering few important parameters,
hospitals are usually required to undergo some which include patients treatment start time of a
examinations, inspections or tests (test is referred to particular task, its end time of the same task, patient
tasks) per his condition. As the tasks to be performed age, and the other detailed treatment data for each of
may be interdependent to be performed by each their tasks which ever is required for calculating the
patient. Some tasks are independent, whereas others optimal time.
might have to depend on the other i.e. wait for the We use a treatment model algorithm and an
completion of dependent tasks. Most of the people hospital queuing system by considering the real-time
who go for their checkup must wait for unpredictable requirements for the treatment, huge data, and
but long periods waiting in queues, waiting for their complexity of the system, we use the big data
turn on order to complete accomplish their checkup environment. The algorithm which is implemented
and treatment task. based on a treatment time model algorithm and thee
IDL - International Digital Library 1|P a g e Copyright@IDL-2017


Random Forest (RF) method for each operative task
which is being performed during the patients visit,
and the waiting time of each task is being analyzed
and predicts the average required time for each
individual task. The hospital recommendation is
defined for an convenient treatment plan for each
patient and task. Patients can check their treating plan
and the predicted waiting time in real-time using a
mobile application developed. The Extensive
experimented results and the analyzed context shows
the time prediction algorithm and Random Forest
implementation system results in providing highly
effective and efficient performance.
Fig 2.1 Architecture of the HQR system
2.2. Data Pre-processing

2. DETAILS EXPERIMENTAL In the preprocessing phase, hospital treatment data
from different treatment tasks are gathered. Everyday
2.1. Problem Statement substantial numbers of patients visit each hospital.
Most of the data in hospitals are unstructured, We collect the data from different hospitals for
massive and high dimensional. As every day hospitals analyzing the treatment time required for each task.
produces a huge amount of business data which Let S be a set of patients in a hospital, and a patient
contains a great deal of information of individual
patient such as medicine data, doctor name, and all
the other detailed information. who has been registered and his information is
The time consumption of the treatment tasks represented by si.
in each department might not lie in the same range,
which can vary per the content of tasks and vary Assume that there are N patients in S:
circumstances, different period and different
conditions of patients. For example, in case of CT S = {s1,s2, . . . . . . , sN},
scan, the time required for old man is generally
longer than that required for a young man. There are where each patient si can have specific unchanged
the strict time requirements for hospital queuing parameters, e.g., name, ID, gender, age, and address
recommendation and management. The speed of of each patient. Some of these parameters are used for
executing the HQR model and PTTP model so also our analysis, whereas others are not preferably used.
critical. The realistic patient data which are collected Each patient can visit multiple treatment tasks per his
from various hospitals are analyzed carefully and health condition. Let X|si be a set of treatment tasks
rigorously based on important parameter such as for patient si during a specific visit:
patient treatment start time, end time, patient age, and
detail treatment content for each different task. We
identify and calculate different waiting times for Table 1: Example of treatment records
different patients based on their operations performed
during treatment.
We use the RF algorithm to train patient
treatment the time consumption based on both patient
and time characteristics and then build PTTP model.
The overall logical structure of the project is divided
into processing modules and a conceptual data
structure is defined as Architectural data flow
diagram as shown in the Figure 2.1
X|si = {x1,x2, . . . . . , xK},


where each task record xi can consist of multiple
information consider Y , e.g., task name, task
location, department, start time, end time, doctor, and
attending staff:
Y|xi = {y1,y2, . . . . ,yM},
where yj is a feature variable of the record of

treatment task xi. As shown in Table 7.1 the
following records collected are used for calculating
the average.
2.3. Workflow of the data pre-processing is given

in the following steps:
a: Collecting data from different treatment tasks

Depending on statistics, the number of patients in a
medium-sized hospital lies can lie between the ranges
from 8,000 to 12,000 records per day, and the number
of remedial treatment data records can range between Fig 2.2: Flow diagram of the patient wait and
from 120,000 to 200,000. These data are gathered treatment model
from different treatment tasks, including all the
information related to particular tasks. and a set of treatment tasks required for each patient.
Some tasks can be dependent on a previous one as a
b:Choose the same dimensions of the data continued task, e.g., surgery or bandage cannot be
The hospital treatment data generated from different done before X-rays. Tasks {A; B; D} are required for
treatment tasks have all the different fields with Patient1, whereas task D must wait for the
completion of B. Tasks {E; B; C; A} are required for
Patient2, and tasks {D; E; C} are required for
different contents and formats which are of different Patient3. Moreover, there are different numbers of
dimensions. In order to train the consumption model patients waiting in the queue of each task, for
for each task, we choose for the same features from example, 7 patients in the queue of task A and 5
these same dimensional data, such as the patient patients in the queue of task B. In this paper, a Patient
information (patient Id, gender, age, etc.), the Treatment Time Prediction (PTTP) model is trained
treatment task information (task name, department
name, doctor name, etc.), and the time information
(Start time and End time). Other feature or other based on hospitals' historical data. The waiting time
dimensions of the treatment data are ignored as they of each treatment task is predicted by PTTP, which is
are not much useful for the PTTP algorithm, such as the sum of all patients' waiting times in the current
patient name, and address. queue. Then, as per each patient's requested treatment
tasks, a Hospital Queuing-Recommendation (HQR)
c: Calculate new feature variable of the data system recommends an efficient and convenient
treatment plan with the least waiting time for the
We choose all these data to train the PTTP model, patient.
various features of the data should be calculated, such The patient treatment time consumption of
as the patient time consumption of each treatment each patient in the current waiting queue is estimated
record, day of week for the treatment time, and the by the trained PTTP model. The whole waiting time
time range of treatment time. of each task at the current time can be predicted, such
The workflow of the patient treatment and as {TA = 35(min); TB = 30(min); TC = 70(min); TD
wait model is illustrated below. Figure 2.2. Illustrates = 24(min); TE = 87(min)}. Finally, the tasks of each
the task flow between different patients. Consider patient are sorted in an ascending order according to
three patients as shown in the figure below (Patient1, the waiting time, except for the dependent tasks.
Patient2, and Patient3),


2.4 PTTP based on the improved random create OOB subset
forest model SOOBi (STrain Strain );
create an empty CART tree h i;
for each independent variable in do
calculate candidates split points
for each in do
calculate the best split point
arg min ( Left + Right)
end for
append node Node(ai,vp) to hi;
split data for left branch
RL(ai,vp) [x| ai < vp]
split data for right branch
RR(ai,vp) [x| ai > vp]
for each data R in { RL(ai,vp) , RR(ai,vp)} do
2.3 PTTP based on RF model Calculate (vpL | ai) max (vp,ai)
if (vp(L|R) | ai) vp,ai then
In the preprocessing phase, the hospital treatment append subnode
data from different treatment tasks are gathered. As Node(ai,vp(L|R)) to Node(ai,vp)
the substantial numbers of patients do visit each multi-branch
hospital every day. After calculating new feature split data to two forks RL and RR
variables of treatment data, the error data need to be else
removed. The treatment records with missing values collect cleaned data for leaf node
for the required data sample for critical features that Dleaf
are removed as incomplete data, such as patient calculate mean value of leaf
gender, patient age, and task name. The treatment node c
records which have negative values induces for time (1/k) Dleaf
consumption those are removed as inconsistent data,
for instance, if the end time of the treatment operation
exist in the dataset and the training data is before the
start time, which can occur in cases when a start time
3 RESULT AND DISCUSSION
is recorded by a human and an end time is shown by a
machine. The types of data shown above are
considered as noisy data. The following snapshots and graphs define the results
or outputs that we will get after step by step execution
In figure 2.3 represents the PTTP model of each proposed service application when a new
based on the cart tree which takes the input as the patient opts for this service for checking the
training data from the dataset and compute the availability for booking the appointment. And the
divisions as described in the below algorithm1 of the
tasks based on the age group and task. Finally, it

computes the average time for each task for a patient.
Algorithm 1: Process of the Random forest based

on PTTP Algorithm
Input:
STrain : the training datasets;
K : the number of CART trees in the RF model.
Output:
PTTPRF : The PTTP model based on the RF
algorithm.
for i = 1 to k do
create training
subset Strain sampling(STrain)


Fig 3.1: The test result of the above model The Hospital queuing treatment plan by using the
displaying the time for each patient for each task. PTTP algorithm which is based on the big data has
been presented in this project.
result is displayed on the patients output screen with
the optimal time which is calculated based on the 1. A random forest technique is used to provide
above procedures. The figure 3.1 shows the time the optimal result which is performed by the
details which includes the start time and end time for patient time treatment prediction algorithm.
each task with the doctors name. In the doctors 2. The proposed system is developed to
login, the doctor can view the list of patients who produce the optimal time for different tasks
request for the opted doctor. with more efficient and convenient plan for
the patients.
REFERENCES
1. Eric. Hamrock, Mathew toerper, Sauleh

Siddiqui, Scott Levin Real-time prediction
of inpatient length of stay for discharge
Fig 3.2: The appointment list in the doctor login prioritization - www.ieee.org Vol.
10.1093/jamia/ocv106 april-2015.
The doctor can login into this application and check
out the list of the patients who has requested for his 2. J G Dai pengyi Shi A two time scale
visit as shown in the figure 3.2. approach to time varying queues in hospital
flow management. Vol. 65.10.1287/opre.
2016 IEEET
3. Raul fidalgo-merino, Marlon nunez Self

adaptive induction of regression trees
10.1109/TPAMI.11.19 IEEE.
4. Kenli Li, Xiaoyong Tang, Bharadhwaj

Veeravali Scheduling precedence
constrained stochastic tasks on
heterogeneous cluster systems -
www.ieee.org Vol. 64 1-jan- 2016 IEEE.
5. Apache. (Jan. 2015). Mahout. [Online].

Available: http://mahout. Ashok Kumar
apache.org.
Fig 3.3 Graph shows the avarage time vs Patient
Age
The figure 3.3 shows the graphs representing the

average time versus the age of the patient with which
we can analyze the minimum average time required
for each task for the patients requested tasks during
the request of the appointment.
6. Y. Xu, K. Li, L. He, L. Zhang, and K. Li, A
hybrid chemical reaction optimization
CONCLUSIONS scheme for task scheduling on


heterogeneous computing systems IEEE
Trans. Parallel Distribute. Syst., vol. 26, no.
12, pp. 3208_3222, Dec. 2015.
7. D. Dahiphale et al., ``An advanced

MapReduce: Cloud MapReduce,
enhancements and applications'' IEEE Trans.
8. Network. Service Manage., vol. 11, no. 1,

pp. 101_115, Mar. 2014.
9. Amiya kumari tripathy, rebeck Carvalho,

keshav pawaskar, Mobile based healthcare
management using artificial intelligent.
www.ieee.org Vol. 10.1109/ICTSD. 30-04-
2015

HQR Framework Optimization For Predicting Patient Treatment Time in Big Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HQR Framework Optimization For Predicting Patient Treatment Time in Big Data

Uploaded by

Copyright:

Available Formats

IDL - International Digital Library Of

Technology & Research

International e-Journal For Technology And Research-2017