You are on page 1of 33

Demand Prediction Model for Rental Bicycle Services

Jan Raphael Floro | Yunong Louisa Gan | Dan Liu

Demand Prediction Model for


Rental Bicycle Services
Final Project Report

Final Version
Fall 2014 CS105 Professor David G. Sullivan, Ph.D.

Abstract
This project aims to create a numeric estimation for rental bicycles given some key conditions and
attributes. A dataset containing past information on the demand for the rental bicycles along with
attributes such as weather, temperature, humidity, wind speed, season, month and day of the week
was obtained to train and test a number of data mining algorithms and, ultimately, to develop the best
possible model for this dataset.
Before applying any data mining algorithm, the dataset needed to be processed and examined. The
team developed two Python-based programs to automate key preprocessing procedures that the
team saw fit to conduct. The dataset was also examined initially through SQL queries to provide a
brief analysis on the historical trend of rental bike demands. Finally, a numeric estimation model was
developed and a conclusion to the model was formulated.

Authors

Jan Raphael S. Floro


Undergraduate Finance, Management Information Systems
Boston University School of Management
Boston, MA
(617) 682-6579 | jrsfloro@bu.edu

Dan Liu
Ph.D. Student Department of Operations & Technology Management
Boston University Graduate School of Management
Boston, MA
(617) 631-6954 | danliu@bu.edu

Yunong Louisa Gan


Undergraduate
Boston University College of Arts & Science
Boston, MA
(xxx) xxx-xxxx | yunongg@bu.edu

Fall 2014 CS105 Floro / Liu / Gan

Dataset Description
The dataset to be used in this project originates from a single, consolidated comma-separated value
(CSV) table generated by a previous similar study by Hadi Fanaee-T of the Laboratory of Artificial
Intelligence and Decision Support at University of Porto, Portugal. Data presented in this CSV file is a
consolidation of three datasets:
1. Capital Bikeshare System Data
http://capitalbikeshare.com/system-data

2. I-Weather Weather Data


http://i-weather.com/weather

3. United States District of Columbia Department of Human Resources Holiday Schedule


http://dchr.dc.gov/page/holiday-schedule

The final, consolidated CSV file to be used in this project is available at:
http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset#
and consists of the following attributes, along with their meanings:
Table 1: Attribute Information
Attribute
instant
dteday
season
yr
mnth
holiday
weekday
workingday
weathersit
temp
atemp
hum
windspeed
casual
registered
cnt

Definition
Record index (unique identifier)
date (MM/DD/YYYY)
season (1: spring, 2: summer, 3: fall, 4: winter)
year (0: 2011, 1: 2012)
month (1: January, 2: February, 3: June, , 12: December)
holiday (0: no, 1: yes)
day of the week (0: Sunday, 1: Monday, , 6: Saturday)
working day (0: no, 1: yes)
discretized weather (1: clear, 2: mist, 3: light snow/rain, 4: rain/ice)
normalized temperature in Celsius (divided by 41)
normalized feeling temperature in Celsius (divided by 50)
normalized humidity (divided by 100)
normalized wind speed (divided by 67)
count of non-registered users who rented bicycles
count of registered users who rented bicycles
count of total (casual plus registered) users who rented bicycles

The dataset consist of a total of 731 records/instances gathered in years 2011 and 2012, published in
two separate CSV files for each year. This dataset will be processed further by our team,
eliminating unwanted attributes and preparing the dataset to be used with Weka.

Demand Prediction Model for Rental Bicycle Services

Dataset Preparation
The initial modification we did in the raw dataset was combine the dataset containing data for year
2011 and one that contained 2012 data. To do this, we simply appended all but the labels from the
2012 CSV file to the end of the 2011 CSV file. We did this because separating the data by the year it
was taken is irrelevant as our data mining methodology in Weka requires us to use only one CSV file.
Any splitting for test and training will be done later on.
Before we are able to run models in our dataset (as well as create an SQL Relational Table), we
needed to first needed to remove the following problematic attributes:

Attribute
instant
dteday
yr
holiday

workingday
casual
registered

Table 2: Problematic Attributes Removed


Reason for Removal
Unique identifier; each value is unique for each record
Unique identifier; data was taken only once per day
Irrelevant; only has two values 0 for 2011 and 1 for 2012
Irrelevant and redundant; only few instances of holidays and
repeats the same information as if weekday = 0 (Sunday) or
weekday = 6 (Saturday), holiday will be = 1
Redundant; if weekday = 0 (Sunday) or weekday = 6
(Saturday), workingday is 0
Redundant; captures same information as cnt attribute but only for
those without registered accounts
Redundant; captures same information as cnt attribute but only for
those with registered accounts

These attributes were removed using a Python program named removeUnwantedRows.py (a text
copy is available at Exhibit 1) and is explained in the next subsection.

First Python Program


We wrote a Python program called removeUnwantedRows.py to remove the aforementioned
problematic attributes. This program takes the dataset that contains both 2011 and 2012 data (the
one we manually combined to form a consolidated, single CSV file). For each record after the labels
in that dataset, we extract only the attributes we want to keep. We did this first by reading the first line
which contains the label to remove it the file processing.
We ran a loop that for each record we have in the CSV file, initialize a new list that contains the right
amount of fields with empty values, assign the attributes we want to keep from the record to the new
list, join them to make a string and then print out the result in a new CSV file. Comments are available
in the *.py file attached with this document.
Once the program has generated a new CSV file containing the attributes that we wish to keep, we
will call another function to take that new CSV file and generate a relational SQL database (SQLite).
The details of that program will be discussed in the next subsection.

Fall 2014 CS105 Floro / Liu / Gan

Second Python Program


Our second Python program called createTables.py is to be ran after we have generated a new
CSV file from removeUnwantedRows.py. This new program will create a database table that is
formatted for the attributes we have kept and reads in the new CSV file generated by the first
program.
The program will read the first line which contains the labels to prevent the labels from being imported
to the SQL table. A loop will then be used to read the succeeding lines, convert the line into a list of
values to be inserted, and pass that along with a parameterized SQL query template that adds it into
the SQL table. Along with values in each line, we have used a counter to generate a unique identifier
(primary key) for the SQL table. We have appended the value of the counter in the first index of the
list of fields.
Comments are available in the *.py file for further disambiguation.
The program will output an SQLite file that can be opened with SQLite Manager, a popular Mozilla
Firefox extension. We created this SQLite algorithm in an effort to better explain our historical data
through queries and data visualization. This analysis is done before running any data mining
algorithm from our dataset.
The following section will first display our analysis on both the queries we have run (depicting what
our dataset looks like in terms of historical facts) and present the various data mining techniques we
used and the model that was resulted from each.
Note: We will further process the data in Weka, but that processing will be done in the Data Analysis:
Numeric Estimation section. The processing discussed in this current section was mainly to remove
the unwanted attributes and create the SQL database.

Demand Prediction Model for Rental Bicycle Services

Data Analysis: SQL and Data Visualization


This section discusses our findings on the explicit, historical facts of the dataset we have just
processed in the last section. Our team ran two queries focusing on the key attributes that historically
affected the average rent in 2011 and 2012.

First Query
This first subsection on Data Analysis presents key facts about our dataset. These facts were derived
from historical data using SQL queries and presented following Edward Tuftes data visualization
techniques.
The first analysis we conducted in this dataset was to identify the average rent for each day of the
week. We ran a simple SQL query that takes the average rent and groups the result by each possible
weekday value. The raw result of this query (available at Exhibit 3) is imported to Microsoft Office
Excel and a graphical representation of the data is generated.
SQL Query

Visualization

Average Rent Per Day


4800
4700
4600
4500

SELECT weekday, AVG(rent)


FROM Hubway
GROUP BY weekday

4400
4300
4200
4100
4000
3900

The result if this first query shows that on average, and on 2011 and 2012 the number of bike
rental demand increases as the week progresses from Sunday to Friday but decreases in Saturday.

Fall 2014 CS105 Floro / Liu / Gan

Second Query
The second set of queries we ran is to identify the relationship between weathersit and rent at
each season. We ran four SQL queries for this (actual results are in Exhibit 4):
SQL Query

Visualization
Rent Under Weather and Season Attributes
7000
6000
5000
4000
3000

SELECT weathersit, AVG(rent)


FROM Hubway
WHERE season = 1
GROUP BY weathersit;

2000
1000
0
Spring

SELECT weathersit, AVG(rent)


FROM Hubway
WHERE season = 2
GROUP BY weathersit;
SELECT weathersit, AVG(rent)
FROM Hubway
WHERE season = 3
GROUP BY weathersit;

Summer

clear, few clouds

Fall

mist+cloudy

Winter
light snow/rain

Rent Under Weather and Season Attributes


7000
6000

SELECT weathersit, AVG(rent)


FROM Hubway
WHERE season = 4
GROUP BY weathersit;

5000
4000
3000
2000
1000
0
Spring
clear, few clouds

Summer

Fall

mist+cloudy

Winter
light snow/rain

From the statistical result, it suggests fluctuations on number of rented bikes across different seasons
under various weather. Fall season attracts people to rent bikes no matter what weather it can be. As
expected, when weather is clear, people tend to rent bikes more than when weather gets worse, such
as mist or light snowing.

Demand Prediction Model for Rental Bicycle Services

Data Analysis: Numeric Estimation Data Mining


This section is presents our analysis on select data mining algorithms found in Weka (specifically for
numeric estimation). Before we display our analysis, the dataset produced in our first Python program
(recall that this is a CSV file that has problematic attributes removed from the original, downloaded
dataset) must be further processed to ensure fidelity in the Weka software.

Weka-specific Preprocessing
The purpose of this subsection is to discuss how the datas attributes were formatted (whether
numeric or nominal) and how the dataset was divided for training and testing a model. The
prerequisite for this procedure states that the original, downloaded and consolidated dataset (the one
that contains the 2011 and 2012 data together) must be processed by removeUnwantedRows.py
and must now have a new CSV file without the problematic attributes.
1. Initially, Weka Explorer was opened and the CSV file without problematic attributes was
opened
2. The following attributes were converted to Nominal (or made sure they were) due to them
having numbers as the input but are really representing codes specified by the dataset
authors:
a. season
b. mnth
c. weekday
d. weathersit
These attributes were not continuous and any mathematical operation done on them would not
be descriptive. The conversion was done using a built-in Weka feature called
NumericToNominal
3. The other attributes (temp, atemp, hum, windspeed and cnt) all had numeric type and this
should be kept numeric. However, in rare cases that Weka chooses to specify them as
nominal, one could simply use another built-in Weka filter called NominalToNumeric
4. The dataset now needs to be randomized in order to eliminate any bias found in the order of
the data. To do this, a built-in Weka filter called Randomize is applied and the dataset is
saved as an *.arff file
5. From this randomized ARFF file, the data must now be split for training and testing models,
with 80% of the total dataset for training and 20% for testing. To do this, a built-in Weka filter
called RemovePercentage will be used along with the following parameters:
a. invertSelection = False
b. percentage = 20.0
After applying this filter, 20% of the randomized dataset will be removed and this new subdataset will be saved as a new ARFF file for training
6. After saving the training ARFF dataset, the Undo button is pressed which returns every
removed data back to where it was originally before the RemovePercentage filter was
applied

Fall 2014 CS105 Floro / Liu / Gan

7. At this point, the original RemovePercentage filter will still be active in the Filter section of
Weka Explorer, even though the Undo button was pressed; to obtain our dataset for testing,
the parameters for the filter will be adjusted to the following specifications:
a. invertSelection = True
b. percentage = 20.0
After applying this filter, 80% of the randomized dataset will be removed and this ensures us
that there are no overlaps with the testing ARFF file and this newly-created training dataset.
This new dataset will be saved as a new ARFF file for testing
8. Back in the Weka Explorer window, the ARFF file for training is reopened and the data mining
can now commence

Numeric Estimation Models Tested


This next subsection explains how the team derived a suitable model for estimating our output
variable. A number of built-in Weka models were used and their goodness was determined by the
one that has the highest correlation coefficient.
Three numeric estimation models have been chosen to be tested against the training data through
10-fold Cross-validation. This technique allows us to ballpark how well the model will work with our
training set and ultimately on more unseen data.
1. Linear Regression a mathematical, linear function that predicts the output, numeric variable
through a weighted sum of the input attributes
Sullivan, David. Data Mining III: Numeric Estimation. Boston University Computer Science 105. Fall 2014.
http://cs-people.bu.edu/dgs/courses/cs105/lectures/data_mining_estimation.pdf

2. Regression Tree a decision tree model specifically designed to handle both numeric and
nominal input variables; each attribute will hold an average value for a given dataset
Sullivan, David. Data Mining III: Numeric Estimation. Boston University Computer Science 105. Fall 2014.
http://cs-people.bu.edu/dgs/courses/cs105/lectures/data_mining_estimation.pdf

3. Neural Networks a mathematical model based on a network of links and nodes that
changes its structure based on the information that flows through the network during the
learning (or training) phase
Singh, Yashpal and Alok Chauhan. Neural Networks In Data Mining. Journal of Theoretical and Applied
Information Technology. 2009. http://www.jatit.org/volumes/research-papers/Vol5No1/1Vol5No6.pdf

Demand Prediction Model for Rental Bicycle Services

The following table is a summary of the models tested and their correlation coefficients along with
their mean absolute error. The raw result for each model is available in the Exhibits section of this
document.

Model

Table 3: Key Statistics for Each Model Used


Correlation Coefficient
Mean Absolute Error

Linear Regression

0.7356

1124.1967

Regression Tree

0.7099

1149.8182

MultilayerPerceptron1

0.4835

1657.8337

Values are presented as Weka reported it; test method used was 10-fold Cross-validation
1. MultilayerPerceptron also called Neural Networks

The following lists our criteria for selecting which model to use. We want the model that:
1. has the highest correlation coefficient (a measure of the correlation between actual and
predicted output values) and;
2. the lowest mean absolute error (the average of the difference between actual and predicted
output values)
It is evident that Linear Regression is the best model to use due to it having the highest correlation
coefficient and lowest mean absolute error amongst the three. The next subsection discusses an indepth analysis of the linear regression model developed and applied with the test ARFF dataset.

Linear Regression: Test Results Against Training and Supplied Testing Sets
The Linear Regression algorithm was restarted but now using a supplied test set: the testing ARFF
data that was generated in the earlier on section on preprocessing. The following key statistics were
observed:
Table 4: Key Statistics for Linear Regression on Different Testing Datasets
Test Set
Correlation Coefficient
Mean Absolute Error
Training Set

0.7564

1089.4426

Supplied Testing Set

0.7741

1082.3166

The Linear Regression model does in-fact generalize very well as the correlation coefficients between
testing against the training set and the testing ARFF dataset are relatively close to each other.
We also want to emphasize that even though the team concludes goodness in the model, the model
has a Mean Absolute Error greater than at least 1,000. This states that, on average, the models
prediction may be off by positive or negative 1,000. The cost of this inaccuracy relies within rental
bike companies and are beyond the scope of this project.

Fall 2014 CS105 Floro / Liu / Gan

Variable Analysis
The following equation was generated by the linear regression algorithm built in Weka when used
against our training set:
Figure 1: Linear Regression Model Generated by Weka
Linear Regression Output (Test Method: Supplied Training Set)
cnt =
1596.993
-702.5904
-417.2612
326.5611
-592.5343
-429.6212
1333.8152
-711.6712
385.7063
1771.9796
308.1646
7413.8362
-2700.4483
-3459.3138
622.8475

*
*
*
*
*
*
*
*
*
*
*
*
*
*

season=4,2,3 +
season=2,3 +
mnth=12,11,4,10,5,8,7,9,6 +
mnth=4,10,5,8,7,9,6 +
mnth=8,7,9,6 +
mnth=7,9,6 +
mnth=9,6 +
mnth=6 +
weekday=2,5,4,6,3 +
weathersit=2,1 +
weathersit=1 +
atemp +
hum +
windspeed +

This equation follows a standard linear regression equation in the format of:

_
_

For the nominal input attributes that we specified in the preprocessing (season, mnth, weekday and
weathersit), Weka had de-discretized some of those attributes together. For instance, the attribute
season appears twice in the model above: season=4,2,3and season=2,3, even though the
original dataset only had one discretized attribute.
For numeric inputs (temp, atemp, hum and windspeed), however, Weka did not follow the same
procedure as it did with nominal ones. Attributes were kept to their original form. However, Weka
removed the temp attribute, signifying that the temperature does not have a great effect on the output
variable.
The next section will explain what each of these variables mean and provide our insight and
conclusions to the factors that ultimately determine the increase or decrease in demand for rental
bicycles.

Demand Prediction Model for Rental Bicycle Services

10

Insights and Conclusions about Generated Model


For our analysis, we have grouped the coefficients with the same input attribute reference.

Coefficient Pairs

Table 5: Analysis on Coefficients


Explanation
season
1
2 or 3
4

1596.993 * season=4,2,3
+
-702.5904 * season=2,3

Net Demand Influence


0
1596.993 702.5904 = 894.4026
1596.993

This model suggests that winter (season = 4) has the


greatest positive influence on bike rentals. Also, spring
(season = 1) does not influence the bike demand
which our team found surprising.
Summer and fall (season = 2 and 3 respectively)
positively influences the demand but not as much as
winter.
The team believes that this is counterintuitive as biking
along with outdoor, physical, recreational endeavors are
usually done during warmer seasons. Summer, as
opposed to winter, should have the greatest demand
influence.

-417.2612 *
mnth=12,11,4,10,5,8,7,9,6
+
326.5611 *
mnth=4,10,5,8,7,9,6
+
-592.5343 * mnth=8,7,9,6
+
-429.6212 * mnth=7,9,6
+
1333.8152 * mnth=9,6
+
-711.6712 * mnth=6

11

Fall 2014 CS105 Floro / Liu / Gan

mnth
1 to 3
4 to 5
6
7
8
9
10
11 to 12

Net Demand Influence


0
-90.7001
-490.7116
-1112.8556
-90.7001
220.9596
-90.7001
-417.2612

Month (mnth) 7 gives the most negative contribution to


total count of rented bikes as this model predicts. From
month 1 to 3, there is no contribution.
The team believes that the demand influence from this
attribute does not align with the prediction for season.
Winter months, as predicted above, should give a
positive net demand influence but gave zero or negative
in this case. Summer months should also give a positive
influence but gave negative in this case.

weekday
0 to 1
2 to 6

385.7063 * weekday=2,5,4,6,3

Net Demand Influence


0
-90.7001

The model predicts that there does not seem to exist


any influence on demand during Sundays and Mondays
(weekday = 0 and 1 respectively). On the rest of the
week (including Saturday) however, the model states
that there is a negative influence for demand.
The team believes that this model does not align with
our historical data analysis presented in the First SQL
Query (p. 5) which shows that the average rent for bikes
increases from Sunday to Friday (weekday = 0 and 5
respectively). This model shows the complete opposite.

weathersit
1
2
3

1771.9796 * weathersit=2,1
+
308.1646 * weathersit=1

Net Demand Influence


2080.1442
1771.9796
0

During clear and misty weather (weathersit = 1 and


2 respectively), the model predicts a positive influence
in demand especially even greater when its clear. On
rain or ice (weathersit = 3) weather, the model
expects no positive or negative influence on demand.
The team believes that this coefficient behaves along
with our intuition that non-rainy or non-snowy days have
a positive influence on the demand for rental bikes.

atemp has linear relationship with the demand for bikes


and the model predicts that for every unit increase in the
normalized feel-temperature, the demand will be
influenced positively by 7413.8362.
7413.8362 * atemp

The team believes that this is intuitive at least for this


dataset since we expect warmer days to positively
influence the demand for rental bikes. There is however
a tipping point between warm enough for biking and
too hot to go outside; however, that borderline is well
beyond the scope of our dataset.

Demand Prediction Model for Rental Bicycle Services

12

-2700.4483 * hum

hum has linear relationship with demand for bikes as the


model predicts. The coefficient, -2700.4483, explains
that for each unit increase in humidity, the demand for
bikes will be negatively influenced by a 2700.4483
reduction in demand. As weather is more humid, less
people want to rent a bike.
The team believes that this is generally makes sense
as higher humidity values may increase the chances of
rain. The prediction for an increase in humidity leading
to lower demand influence somewhat aligns with our
weathersit attribute since during rain or ice the
contribution will be zero.

-3459.3138 * windspeed

windspeed also has linear relationship with the


demand for bikes. The negative coefficient shows that a
unit increase in the wind speed, there will be a
3459.3138 negative influence for the demand. That is,
as wind becomes stronger, people are less likely to get
rent a bike.
The team believes that this is makes sense as higher
wind speeds will deter people from renting bikes.

Summary and Conclusion


In developing a model for predicting rental bike demand our output variable, we have obtained a
dataset from an external source that contains historical data on weather conditions, temperature,
humidity, wind speed, month and day of the week along with our desired output variable. This dataset
was divided into two files: one for 2011 and one for 2012. Before running any data mining algorithm,
the raw datasets were merged so both 2011 and 2012 data were in one file and this consolidated
dataset was further processed by a Python-based program that eliminates problematic attributes. The
result of this was a new, single dataset without the problematic attributes.
In examining the historical data, we passed the newly-generated dataset to another Python-based
program that creates an SQL Relational Database. Queries were executed in this database to
determine the 1.) the average rent per day of the week and; 2.) the trend of bike demand against
season and weather. Visualizations were reported.
Finally, the dataset was further processed in Weka to prepare it for data mining; this involves
randomizing, and splitting between testing and training data. Several numeric estimation algorithms
were tested against the training data, evaluating the model using 10-fold cross-validation as a
ballpark examination. Linear Regression proved to be the best model to use and was tested against
the training data and testing data.
We conclude that the Linear Regression model developed in this analysis generalizes well with the
dataset used. However, the dataset may not be good enough to use in a real-world, business setting
due a relatively low correlation coefficient.
13

Fall 2014 CS105 Floro / Liu / Gan

Exhibits

Demand Prediction Model for Rental Bicycle Services

14

Exhibit 1: First Python Program


removeUnwantedRows.py
#
#
#
#
#
#
#

File: removeUnwantedRows.py
Authors: Yunong Louisa Gan, Jan Raphael Floro, Dan Liu
Assignment: Fall 2014 CS105 Final Project
Description: This program takes the raw dataset CSV file, eliminates unwanted
rows (i.e. those rows that correspond to names and ids or those that will not
be useful in the data mining application).

# Ask the user what the names of the input and output files are
infile_name = input('Please enter the file name of the raw CSV dataset (with extensions): ')
outfile_name = input('Please enter the file name of the new CSV dataset (with extensions):
')
# Create file handles for both the input and output files
infile = open(infile_name, mode = 'r')
outfile = open(outfile_name, mode = 'w')
# For each line in the input file, keep only the attributes that we want to keep
for line in infile:
# Split the line into a list of fields
list_of_fields = line[:-1].split(',')
# Initialize new list that will contain the attributes that we want to keep
new_fields = ['','','','','','','','','']
# Insert those selected attributes in the new list in order
new_fields[0] = list_of_fields[2]
new_fields[1] = list_of_fields[4]
new_fields[2] = list_of_fields[6]
new_fields[3] = list_of_fields[8]
new_fields[4] = list_of_fields[9]
new_fields[5] = list_of_fields[10]
new_fields[6] = list_of_fields[11]
new_fields[7] = list_of_fields[12]
new_fields[8] = list_of_fields[15]
# Join new list into a string and store it in the output file
new_line = ','.join(new_fields)
print(new_line, file = outfile)
# Close the file handles
infile.close()
outfile.close()

15

Fall 2014 CS105 Floro / Liu / Gan

Exhibit 2: Second Python Program


createTables.py
#
#
#
#
#
#
#

File: createTables.py
Authors: Yunong Louisa Gan, Jan Raphael Floro, Dan Liu
Assignment: Fall 2014 CS105 Final Project
Description: This program takes the newly formatted CSV file generated by
reviewUnwantedRows.py, creates an SQLite database file and imports the
records from the CSV file to the new database file.

# Import sqlite3 module


import sqlite3
# Ask the user for an output file name
db_filename = input('Please enter an output file name for the SQLite 3 Database (include
extension, *.sqlite preferred): ')
# Create the database file with the user's file name, connect to it and make a cursor
db = sqlite3.connect(db_filename)
cursor = db.cursor()
# Create SQL table framework by creating a query and executing it agains the cursor
query = ''' CREATE TABLE Hubway (
id INTEGER PRIMARY KEY,
season INTEGER,
month INTEGER,
weekday INTEGER,
weathersit INTEGER,
temp REAL,
feeltemp REAL,
humidity REAL,
windspeed REAL,
rent INTEGER); '''
cursor.execute(query, [])
# Open the newly created CSV file and eliminate the first line that contains the labels
infile_name = input('Please enter the file name of the new CSV dataset (with extensions): ')
infile = open(infile_name, mode = 'r')
temporary = infile.readline()
# Initialize a counter to be used as the primary key for the SQL database
count = 1
# For each line in the CSV file opened, append the counter and insert the record
# into the database file
for line in infile:
# Split line into a list of fields and insert counter into index 0
list_of_fields = line[:-1].split(',')
list_of_fields.insert(0, count)
# Construct an SQL query to insert values in the SQL file and execute
query = ''' INSERT INTO Hubway
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?);'''
cursor.execute(query, list_of_fields)
# Add to the counter
count = count + 1
#Close the file handle, commit the database and close it
infile.close()
db.commit()
db.close()

Demand Prediction Model for Rental Bicycle Services

16

Exhibit 3: Raw Results from First SQL Query


SQL Query

Result
weekday
0
1
2
3
4
5
6

SELECT weekday, AVG(rent)


FROM Hubway
GROUP BY weekday;

AVG(rent)
4228.829
4338.124
4510.663
4548.538
4667.26
4690.288
4550.543

Exhibit 4: Raw Results from Second SQL Query


SQL Query
SELECT weathersit, AVG(rent)
FROM Hubway
WHERE season = 1
GROUP BY weathersit;

SELECT weathersit, AVG(rent)


FROM Hubway
WHERE season = 2
GROUP BY weathersit;

weathersit

AVG(rent)
1
2
3

weathersit

2811.135
2357.167
934.75

AVG(rent)
1
2
3

5548.549
4236.706
1169

SELECT weathersit, AVG(rent)


FROM Hubway
WHERE season = 3
GROUP BY weathersit;

weathersit
1
2
3

AVG(rent)
5878.257
5222.479
2751.75

SELECT weathersit, AVG(rent)


FROM Hubway
WHERE season = 4
GROUP BY weathersit;

weathersit
1
2
3

AVG(rent)
5043.563
4654
1961.6

17

Result

Fall 2014 CS105 Floro / Liu / Gan

Exhibit 5: Raw Results from Linear Regression (10-fold Cross-validation)


Linear Regression; Test Method: Cross-validation (10 Folds)
=== Run information ===
Scheme:
weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation:
hubway_new-weka.filters.unsupervised.attribute.NumericToNominal-R1-4weka.filters.unsupervised.instance.Randomize-S42weka.filters.unsupervised.instance.RemovePercentage-P20.0
Instances:
585
Attributes:
9
season
mnth
weekday
weathersit
temp
atemp
hum
windspeed
cnt
Test mode:
user supplied test set: size unknown (reading incrementally)
=== Classifier model (full training set) ===

Linear Regression Model


cnt =
1596.993
-702.5904
-417.2612
326.5611
-592.5343
-429.6212
1333.8152
-711.6712
385.7063
1771.9796
308.1646
7413.8362
-2700.4483
-3459.3138
622.8475

*
*
*
*
*
*
*
*
*
*
*
*
*
*

season=4,2,3 +
season=2,3 +
mnth=12,11,4,10,5,8,7,9,6 +
mnth=4,10,5,8,7,9,6 +
mnth=8,7,9,6 +
mnth=7,9,6 +
mnth=9,6 +
mnth=6 +
weekday=2,5,4,6,3 +
weathersit=2,1 +
weathersit=1 +
atemp +
hum +
windspeed +

Time taken to build model: 0.12 seconds


=== Evaluation on test set ===
Time taken to test model on supplied test set: 0.01 seconds
=== Summary ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances

0.7741
1082.3166
1274.6821
67.1989 %
63.5683 %
146

Demand Prediction Model for Rental Bicycle Services

18

Exhibit 6: Raw Results from MultilayerPerceptron (10-fold Cross-validation)


MultilayerPerceptron; Test Method: Cross-validation (10 Folds)
=== Run information ===
Scheme:weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 E 20 -H a
Relation:
hubway_new-weka.filters.unsupervised.attribute.NumericToNominal-R1-4weka.filters.unsupervised.instance.Randomize-S42weka.filters.unsupervised.instance.RemovePercentage-P20.0
Instances:
585
Attributes:
9
season
mnth
weekday
weathersit
temp
atemp
hum
windspeed
cnt
Test mode:10-fold cross-validation
=== Classifier model (full training set) ===
Linear Node 0
Inputs
Weights
Threshold
0.4034591624250485
Node 1
1.073001771389342
Node 2
0.7997763940718912
Node 3
-0.7810987098482175
Node 4
-1.1441930460472007
Node 5
0.7583796211670438
Node 6
1.5138917522206743
Node 7
-0.44528535259435686
Node 8
-0.7695221250216051
Node 9
-0.8728329024813739
Node 10
-0.3971037710754488
Node 11
-0.7387643450840765
Node 12
0.6697937540799863
Node 13
-0.30447458472747896
Node 14
-0.6657354929961227
Node 15
1.5878296789150248
Sigmoid Node 1
Inputs
Weights
Threshold
-0.835105600103694
Attrib season=1
-0.26923387083229156
Attrib season=2
0.7433281789244546
Attrib season=3
2.485726909249855
Attrib season=4
-1.3134921021697237
Attrib mnth=1
1.1332560183231664
Attrib mnth=2
0.04118700402885844
Attrib mnth=3
-0.7637181963924573
Attrib mnth=4
0.9792164723621126
Attrib mnth=5
1.547153061007384
Attrib mnth=6
-1.1788581193832168
Attrib mnth=7
-1.4667373709969456
Attrib mnth=8
1.793281659552228
Attrib mnth=9
4.1277562348886905
Attrib mnth=10
1.5116610726829658
Attrib mnth=11
-0.11545346200184586
Attrib mnth=12
0.2973099321825597

19

Fall 2014 CS105 Floro / Liu / Gan

Attrib weekday=0
0.0735713150735382
Attrib weekday=1
-0.13531496279818347
Attrib weekday=2
0.18891798030488438
Attrib weekday=3
-0.15302105370984156
Attrib weekday=4
0.756820806490527
Attrib weekday=5
3.2669990717655377
Attrib weekday=6
-0.013196966833346382
Attrib weathersit=1
1.8987162613074986
Attrib weathersit=2
-0.9220489355100241
Attrib weathersit=3
-0.16574050012262911
Attrib temp
-0.3462691238736456
Attrib atemp
-0.7232156351891944
Attrib hum
-7.555827566193684
Attrib windspeed
1.2149137740743714
Sigmoid Node 2
Inputs
Weights
Threshold
-0.6272845810275781
Attrib season=1
0.4511309813560522
Attrib season=2
0.3243673306399936
Attrib season=3
-0.14462217443822908
Attrib season=4
0.5717810187426616
Attrib mnth=1
-1.0620271502105918
Attrib mnth=2
1.5029576238066138
Attrib mnth=3
0.7370720015247065
Attrib mnth=4
-1.4484679734569135
Attrib mnth=5
1.2193167796387365
Attrib mnth=6
2.232955502400696
Attrib mnth=7
0.06715219650782216
Attrib mnth=8
0.38798295516628284
Attrib mnth=9
0.29551875297827473
Attrib mnth=10
1.3849939708300956
Attrib mnth=11
0.9129130515295194
Attrib mnth=12
-0.14512949985777754
Attrib weekday=0
1.9931321860448894
Attrib weekday=1
3.1967624812276476
Attrib weekday=2
-1.0879978182772783
Attrib weekday=3
0.457659508255081
Attrib weekday=4
1.4975764986214952
Attrib weekday=5
-0.9953808172895143
Attrib weekday=6
-2.075406651189462
Attrib weathersit=1
1.6415054200085994
Attrib weathersit=2
0.010771694877575056
Attrib weathersit=3
-1.145614687211436
Attrib temp
0.3062507318500584
Attrib atemp
1.4272500834524942
Attrib hum
-3.7023979230445248
Attrib windspeed
3.43243162409721
Sigmoid Node 3
Inputs
Weights
Threshold
-1.099251439861587
Attrib season=1
-0.2983352128311242
Attrib season=2
1.2266648481842768
Attrib season=3
1.3776057739283585
Attrib season=4
-0.16892801822017262
Attrib mnth=1
0.32915348891007523
Attrib mnth=2
3.2794585387928885
Attrib mnth=3
0.5737609948893269
Attrib mnth=4
0.945881493134643
Attrib mnth=5
-1.0281547724703388
Attrib mnth=6
-0.20111870519034677
Attrib mnth=7
2.426206310650778
Attrib mnth=8
1.3134003894101203

Demand Prediction Model for Rental Bicycle Services

20

Attrib mnth=9
0.8637602321512693
Attrib mnth=10
3.7722260235020193
Attrib mnth=11
-1.7180240810799472
Attrib mnth=12
0.34500229426844337
Attrib weekday=0
-1.2359496617461008
Attrib weekday=1
-1.2523303956510252
Attrib weekday=2
-0.0512640219334471
Attrib weekday=3
5.0344463244969555
Attrib weekday=4
3.1705493865198813
Attrib weekday=5
-0.7969667993073348
Attrib weekday=6
0.5012732532805586
Attrib weathersit=1
-1.3808865602742402
Attrib weathersit=2
3.687279205414965
Attrib weathersit=3
-1.263950856900867
Attrib temp
-1.6025642890452991
Attrib atemp
-1.2460303406397897
Attrib hum
3.806611025524226
Attrib windspeed
2.1413694085068093
Sigmoid Node 4
Inputs
Weights
Threshold
-0.6125614054374625
Attrib season=1
2.301339192069271
Attrib season=2
0.21595465624979585
Attrib season=3
-2.002381472625347
Attrib season=4
0.5776248990007989
Attrib mnth=1
1.4147139027031674
Attrib mnth=2
-0.7873093719742582
Attrib mnth=3
-1.4647000862547463
Attrib mnth=4
0.30894778422015495
Attrib mnth=5
0.11744997371496953
Attrib mnth=6
1.678371884010929
Attrib mnth=7
0.2601669996084536
Attrib mnth=8
0.7165572253867979
Attrib mnth=9
-0.9759034972248922
Attrib mnth=10
-1.2035074076468777
Attrib mnth=11
3.881580115131238
Attrib mnth=12
1.6673594234003442
Attrib weekday=0
2.430737585641643
Attrib weekday=1
-0.029305707349405366
Attrib weekday=2
-0.2809770028027914
Attrib weekday=3
-1.002602704706082
Attrib weekday=4
0.29385881269950076
Attrib weekday=5
-0.4171895892417452
Attrib weekday=6
1.831364169728795
Attrib weathersit=1
-1.702663907847466
Attrib weathersit=2
0.8307789973732239
Attrib weathersit=3
1.4086185714457284
Attrib temp
4.765107492570161
Attrib atemp
5.397309969510869
Attrib hum
2.6743033792236504
Attrib windspeed
4.843385409840476
Sigmoid Node 5
Inputs
Weights
Threshold
-0.7702271035603074
Attrib season=1
-0.7912698399661623
Attrib season=2
1.3342816535525144
Attrib season=3
-2.06004550322802
Attrib season=4
3.0002838609962175
Attrib mnth=1
0.6274233404933738
Attrib mnth=2
0.004961995450475025
Attrib mnth=3
2.1208962407896137
Attrib mnth=4
-1.5587774103063154

21

Fall 2014 CS105 Floro / Liu / Gan

Attrib mnth=5
0.7756856754052758
Attrib mnth=6
1.6368281383100256
Attrib mnth=7
-0.3001713973913252
Attrib mnth=8
0.8685151927718178
Attrib mnth=9
3.178730593716785
Attrib mnth=10
2.1142799632718488
Attrib mnth=11
0.3236981666904239
Attrib mnth=12
-2.336499036978622
Attrib weekday=0
-0.5993469100502076
Attrib weekday=1
-0.650182502557702
Attrib weekday=2
1.899648613421824
Attrib weekday=3
1.511829159936197
Attrib weekday=4
0.8699142634040725
Attrib weekday=5
-0.9403406416835125
Attrib weekday=6
1.6589672748963804
Attrib weathersit=1
-0.41529560006794974
Attrib weathersit=2
2.5393108952031054
Attrib weathersit=3
-1.3495279722872315
Attrib temp
1.6766375355576937
Attrib atemp
1.3960864802250765
Attrib hum
-8.51696696650129
Attrib windspeed
-3.7266593200778106
Sigmoid Node 6
Inputs
Weights
Threshold
-0.9771847570286205
Attrib season=1
-0.2613288541630906
Attrib season=2
1.8662783168120032
Attrib season=3
-0.005021092531550224
Attrib season=4
0.20377775514061652
Attrib mnth=1
0.7870300635035192
Attrib mnth=2
0.6277338920383292
Attrib mnth=3
2.661395650848618
Attrib mnth=4
2.515850786190174
Attrib mnth=5
-0.7248311343538938
Attrib mnth=6
-1.0813811680170495
Attrib mnth=7
-1.3427485472166332
Attrib mnth=8
2.372020219049747
Attrib mnth=9
0.7357792316702811
Attrib mnth=10
2.341063329857603
Attrib mnth=11
0.19807007779987235
Attrib mnth=12
0.5071342076845808
Attrib weekday=0
-0.3763085296193364
Attrib weekday=1
-0.22802440531751533
Attrib weekday=2
1.5205637874092304
Attrib weekday=3
0.6334056582596763
Attrib weekday=4
0.43738828897372356
Attrib weekday=5
2.9876502082369476
Attrib weekday=6
-0.1924315838130855
Attrib weathersit=1
-0.018308474406878796
Attrib weathersit=2
1.1611455705184723
Attrib weathersit=3
-0.13424346857657693
Attrib temp
3.1192446832459173
Attrib atemp
3.475714046525946
Attrib hum
1.5964812593854052
Attrib windspeed
-5.884628344985574
Sigmoid Node 7
Inputs
Weights
Threshold
-0.4529682814682877
Attrib season=1
1.5791493238974204
Attrib season=2
-0.7650458514198675
Attrib season=3
1.1605783776784193
Attrib season=4
-1.1050749606551218

Demand Prediction Model for Rental Bicycle Services

22

Attrib mnth=1
-0.6800936430177967
Attrib mnth=2
0.5228815598915845
Attrib mnth=3
-1.4596114288748192
Attrib mnth=4
-0.0629507087293041
Attrib mnth=5
-0.15856096316406074
Attrib mnth=6
-0.12643229845822276
Attrib mnth=7
-0.7501222794824005
Attrib mnth=8
1.8806943678287777
Attrib mnth=9
1.0431481435782375
Attrib mnth=10
-1.0252917554411787
Attrib mnth=11
2.998289898040177
Attrib mnth=12
2.0677968167979586
Attrib weekday=0
-0.20999667653014462
Attrib weekday=1
0.9165956291035388
Attrib weekday=2
-1.1355798902899121
Attrib weekday=3
0.08361781774622752
Attrib weekday=4
3.791807676578451
Attrib weekday=5
-0.9836242390932878
Attrib weekday=6
-0.4618703533590345
Attrib weathersit=1
0.7073144066159917
Attrib weathersit=2
0.1769490726847709
Attrib weathersit=3
-0.5452392680473542
Attrib temp
-5.065772204251191
Attrib atemp
-5.065930362456648
Attrib hum
-3.2268786450595424
Attrib windspeed
3.919757799960186
Sigmoid Node 8
Inputs
Weights
Threshold
-0.25878906034398474
Attrib season=1
1.7195677853518927
Attrib season=2
2.0374723594236994
Attrib season=3
-1.9227497260970419
Attrib season=4
-1.2990966183886983
Attrib mnth=1
-0.837753747481051
Attrib mnth=2
-0.4154871428835184
Attrib mnth=3
0.8172276279367406
Attrib mnth=4
-0.48671923009366624
Attrib mnth=5
1.539351924365433
Attrib mnth=6
-0.9481718102829366
Attrib mnth=7
-0.49512907156929203
Attrib mnth=8
0.1335648882958463
Attrib mnth=9
-1.2787962869069402
Attrib mnth=10
2.3393422831585617
Attrib mnth=11
0.20630981405885632
Attrib mnth=12
2.1039274828964065
Attrib weekday=0
1.716873073195706
Attrib weekday=1
1.4866518766242534
Attrib weekday=2
0.9701241501370421
Attrib weekday=3
-0.5568611041376056
Attrib weekday=4
-2.0245716033063266
Attrib weekday=5
-1.8547284192457834
Attrib weekday=6
1.733637225009825
Attrib weathersit=1
-1.1282716351883253
Attrib weathersit=2
-2.403690737565057
Attrib weathersit=3
3.7058078798903726
Attrib temp
-7.8141675089957765
Attrib atemp
-6.298342893235259
Attrib hum
3.960761207056994
Attrib windspeed
2.368358539013378
Sigmoid Node 9
Inputs
Weights
Threshold
-0.6785067655659063

23

Fall 2014 CS105 Floro / Liu / Gan

Attrib season=1
1.3460885774230125
Attrib season=2
-0.47406869808971397
Attrib season=3
-0.8114498905176144
Attrib season=4
1.315392714792884
Attrib mnth=1
1.2526638838012385
Attrib mnth=2
-1.0810763554356557
Attrib mnth=3
-0.8921779858833379
Attrib mnth=4
1.029140952259634
Attrib mnth=5
-0.30660989832682434
Attrib mnth=6
1.4810997523081029
Attrib mnth=7
-0.14191200596308662
Attrib mnth=8
3.553757450799666
Attrib mnth=9
2.57134670561843
Attrib mnth=10
-0.7265894520873476
Attrib mnth=11
-0.13122524184480922
Attrib mnth=12
0.6214471575145861
Attrib weekday=0
1.4625597450668038
Attrib weekday=1
-0.6864927297376036
Attrib weekday=2
-1.7754798847193733
Attrib weekday=3
1.1978235318988002
Attrib weekday=4
1.4027032881101136
Attrib weekday=5
0.7947010837289776
Attrib weekday=6
1.1214777401031946
Attrib weathersit=1
0.1962750017466593
Attrib weathersit=2
-1.3262974218714676
Attrib weathersit=3
1.8514121804587593
Attrib temp
-1.3973218858029957
Attrib atemp
-1.5614428287532187
Attrib hum
5.554579028303288
Attrib windspeed
-4.089173072177312
Sigmoid Node 10
Inputs
Weights
Threshold
-0.511050609252634
Attrib season=1
0.6282810932143105
Attrib season=2
-0.7985809100493843
Attrib season=3
0.9870076768960008
Attrib season=4
0.19591056205734153
Attrib mnth=1
0.4181607292354497
Attrib mnth=2
1.1475352468345232
Attrib mnth=3
-0.9941174283444357
Attrib mnth=4
0.7793443380351899
Attrib mnth=5
-0.636849978484071
Attrib mnth=6
-0.11165092999036118
Attrib mnth=7
0.46523244217991044
Attrib mnth=8
2.8582610666913673
Attrib mnth=9
-1.0641371135852415
Attrib mnth=10
0.07456934042987022
Attrib mnth=11
1.560616192998869
Attrib mnth=12
0.3391244816248364
Attrib weekday=0
2.046953917544937
Attrib weekday=1
0.4510434186629579
Attrib weekday=2
-0.8562427579673721
Attrib weekday=3
-1.3400539948371752
Attrib weekday=4
-0.8572933745052744
Attrib weekday=5
1.31137401222742
Attrib weekday=6
1.6840054669310538
Attrib weathersit=1
1.2484558327666806
Attrib weathersit=2
-0.8843309549485495
Attrib weathersit=3
0.11101783035178547
Attrib temp
1.5864969041512569
Attrib atemp
3.210202550903
Attrib hum
-4.8149542453379075

Demand Prediction Model for Rental Bicycle Services

24

Attrib windspeed
3.8170413666245637
Sigmoid Node 11
Inputs
Weights
Threshold
-0.5686619247340974
Attrib season=1
-1.169883856108531
Attrib season=2
1.93486716903313
Attrib season=3
0.08957146189122249
Attrib season=4
0.41242253740593826
Attrib mnth=1
0.20588372045946285
Attrib mnth=2
0.9381244153067557
Attrib mnth=3
0.6282435043542954
Attrib mnth=4
0.5405919350387205
Attrib mnth=5
-0.2854674131379757
Attrib mnth=6
0.07862482687185307
Attrib mnth=7
-1.576714287515947
Attrib mnth=8
-0.3796318617730249
Attrib mnth=9
1.0664481762347369
Attrib mnth=10
1.6650183674389791
Attrib mnth=11
2.1424992550563853
Attrib mnth=12
1.125859636915239
Attrib weekday=0
-0.17848770649517126
Attrib weekday=1
3.2545747754224044
Attrib weekday=2
-0.11149424459254079
Attrib weekday=3
-0.32758348735031745
Attrib weekday=4
0.2350520111891305
Attrib weekday=5
0.5606604028445626
Attrib weekday=6
-0.3106111045316933
Attrib weathersit=1
0.25104705574039266
Attrib weathersit=2
-0.5188577045188077
Attrib weathersit=3
0.821822151613731
Attrib temp
4.69745130744083
Attrib atemp
3.5464574734503356
Attrib hum
2.2146954305043827
Attrib windspeed
2.6107793523486618
Sigmoid Node 12
Inputs
Weights
Threshold
-1.0001077194118586
Attrib season=1
0.007660215255616815
Attrib season=2
0.18633559280489118
Attrib season=3
1.3140801498060022
Attrib season=4
0.4378448030221447
Attrib mnth=1
0.7617580092569444
Attrib mnth=2
1.1165384128916294
Attrib mnth=3
2.4360821036211773
Attrib mnth=4
3.7256009920423656
Attrib mnth=5
0.06509588338742371
Attrib mnth=6
0.5199676204003436
Attrib mnth=7
-2.021964309239575
Attrib mnth=8
2.6654400260033175
Attrib mnth=9
-1.5209283055101914
Attrib mnth=10
-1.4724045552178067
Attrib mnth=11
0.5200076247219011
Attrib mnth=12
2.52393376866104
Attrib weekday=0
0.8006660297704346
Attrib weekday=1
0.729795789863744
Attrib weekday=2
-0.7039718980406569
Attrib weekday=3
1.481103438453933
Attrib weekday=4
0.9334176603143836
Attrib weekday=5
-1.3517083435771586
Attrib weekday=6
2.8898065502309116
Attrib weathersit=1
0.6554960098199091
Attrib weathersit=2
-0.30071483501561563

25

Fall 2014 CS105 Floro / Liu / Gan

Attrib weathersit=3
0.582891319859671
Attrib temp
1.8019054797069152
Attrib atemp
0.8623595532966457
Attrib hum
1.7945550277269056
Attrib windspeed
-8.239394333260778
Sigmoid Node 13
Inputs
Weights
Threshold
-0.611177431376937
Attrib season=1
0.07247744149706539
Attrib season=2
0.9485054476665893
Attrib season=3
0.9904030953238708
Attrib season=4
-0.7440939578224179
Attrib mnth=1
0.5129110354730324
Attrib mnth=2
2.3827817981536823
Attrib mnth=3
1.8647340744377003
Attrib mnth=4
2.2207723996432396
Attrib mnth=5
-0.8639604462346794
Attrib mnth=6
-0.6741561272738268
Attrib mnth=7
-0.47992338910712007
Attrib mnth=8
3.0811980898465228
Attrib mnth=9
0.14149268967847414
Attrib mnth=10
0.055527699388914636
Attrib mnth=11
-0.18068581941821074
Attrib mnth=12
-1.5051768576768847
Attrib weekday=0
0.2923544769017968
Attrib weekday=1
-1.4647384104991006
Attrib weekday=2
-0.7476791015307748
Attrib weekday=3
1.5899132385399868
Attrib weekday=4
-0.13745335459908556
Attrib weekday=5
4.356899784647938
Attrib weekday=6
-0.5644131602251132
Attrib weathersit=1
1.1830806908975657
Attrib weathersit=2
-0.10021541277483181
Attrib weathersit=3
-0.4151347872892149
Attrib temp
-3.452829145761675
Attrib atemp
-2.3092272032481893
Attrib hum
0.7428443847660146
Attrib windspeed
-2.5539286222844284
Sigmoid Node 14
Inputs
Weights
Threshold
-0.6930524111884078
Attrib season=1
-0.31974095339856545
Attrib season=2
-0.5429448211105121
Attrib season=3
1.5704976244416733
Attrib season=4
0.5189035636240676
Attrib mnth=1
1.341121349790957
Attrib mnth=2
-0.9942460163565202
Attrib mnth=3
-0.19205479611143306
Attrib mnth=4
3.1592456033903615
Attrib mnth=5
0.9586230723765657
Attrib mnth=6
0.27784818793116617
Attrib mnth=7
-0.29299875442486006
Attrib mnth=8
0.8435521256370252
Attrib mnth=9
1.9257553891966983
Attrib mnth=10
3.292014809574857
Attrib mnth=11
-1.4498249073067841
Attrib mnth=12
-1.8992134384770225
Attrib weekday=0
-0.6756458899252727
Attrib weekday=1
-1.619254054866607
Attrib weekday=2
3.848458826993125
Attrib weekday=3
-2.0173379376758134
Attrib weekday=4
-0.4353739524495851

Demand Prediction Model for Rental Bicycle Services

26

Attrib weekday=5
3.109339403564944
Attrib weekday=6
1.1310531803497101
Attrib weathersit=1
-1.458550824527361
Attrib weathersit=2
1.8993351516539208
Attrib weathersit=3
0.31732623539007226
Attrib temp
-3.4720114633912518
Attrib atemp
-3.1876645151808765
Attrib hum
2.914371295388255
Attrib windspeed
-3.2890457801385313
Sigmoid Node 15
Inputs
Weights
Threshold
-0.7548126052587976
Attrib season=1
1.0328911767957796
Attrib season=2
1.0274902581506329
Attrib season=3
-1.0977062529521218
Attrib season=4
0.45723474362362704
Attrib mnth=1
0.1543625166353079
Attrib mnth=2
1.0612106019061183
Attrib mnth=3
0.4006174146235065
Attrib mnth=4
-0.5100762437052182
Attrib mnth=5
1.9345616156832235
Attrib mnth=6
1.2327250798860243
Attrib mnth=7
1.0084025910057925
Attrib mnth=8
0.27628951911412547
Attrib mnth=9
0.22073765851209307
Attrib mnth=10
1.51624719263242
Attrib mnth=11
0.1930615744913734
Attrib mnth=12
-0.033773664857861686
Attrib weekday=0
1.6416286991951565
Attrib weekday=1
-0.8778335838960318
Attrib weekday=2
-0.6325070026820876
Attrib weekday=3
-0.4336270523704613
Attrib weekday=4
-0.2194426369831905
Attrib weekday=5
2.6293494747329498
Attrib weekday=6
1.5617350643734553
Attrib weathersit=1
-0.40909854703646276
Attrib weathersit=2
1.0038181265119062
Attrib weathersit=3
0.17035593212749028
Attrib temp
2.064711382203545
Attrib atemp
1.3930822849883093
Attrib hum
-4.300926083544343
Attrib windspeed
-4.666388282278193
Class
Input
Node 0

Time taken to build model: 7.64 seconds


=== Cross-validation ===
=== Summary ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances

27

Fall 2014 CS105 Floro / Liu / Gan

0.4835
1657.8337
2192.345
105.1584 %
114.03
%
585

Exhibit 7: Raw Results from Regression Tree M5P (10-fold Cross-validation)


M5P; Test Method: Cross-validation (10 Folds)
=== Run information ===
Scheme:weka.classifiers.trees.M5P -R -M 6.0
Relation:
hubway_new-weka.filters.unsupervised.attribute.NumericToNominal-R1-4weka.filters.unsupervised.instance.Randomize-S42weka.filters.unsupervised.instance.RemovePercentage-P20.0
Instances:
585
Attributes:
9
season
mnth
weekday
weathersit
temp
atemp
hum
windspeed
cnt
Test mode:10-fold cross-validation
=== Classifier model (full training set) ===
M5 pruned regression tree:
(using smoothed linear models)
atemp <= 0.431 :
|
atemp <= 0.258 :
|
|
temp <= 0.218 : LM1 (29/26.416%)
|
|
temp > 0.218 : LM2 (31/36.862%)
|
atemp > 0.258 :
|
|
season=4,2,3 <= 0.5 : LM3 (73/60.224%)
|
|
season=4,2,3 > 0.5 :
|
|
|
season=2,3 <= 0.5 : LM4 (81/60.55%)
|
|
|
season=2,3 > 0.5 : LM5 (25/84.942%)
atemp > 0.431 :
|
hum <= 0.738 :
|
|
windspeed <= 0.13 : LM6 (56/60.146%)
|
|
windspeed > 0.13 :
|
|
|
mnth=9,6 <= 0.5 : LM7 (150/70.537%)
|
|
|
mnth=9,6 > 0.5 :
|
|
|
|
hum <= 0.679 :
|
|
|
|
|
atemp <= 0.655 : LM8 (30/57.66%)
|
|
|
|
|
atemp > 0.655 : LM9 (11/50.063%)
|
|
|
|
hum > 0.679 : LM10 (15/64.081%)
|
hum > 0.738 :
|
|
hum <= 0.849 :
|
|
|
windspeed <= 0.155 : LM11 (25/64.076%)
|
|
|
windspeed > 0.155 : LM12 (34/57.154%)
|
|
hum > 0.849 : LM13 (25/67.506%)
LM num: 1
cnt =
+ 2060.0689
LM num: 2
cnt =
+ 2378.9839
LM num: 3
cnt =

Demand Prediction Model for Rental Bicycle Services

28

+ 3154.0922
LM num: 4
cnt =
+ 4242.4465
LM num: 5
cnt =
+ 3653.44
LM num: 6
cnt =
+ 6198.3408
LM num: 7
cnt =
+ 5488.4579
LM num: 8
cnt =
+ 6382.4125
LM num: 9
cnt =
+ 5990.584
LM num: 10
cnt =
+ 5684.426
LM num: 11
cnt =
+ 5408.337
LM num: 12
cnt =
+ 4785.3039
LM num: 13
cnt =
+ 4108.4235
Number of Rules : 13
Time taken to build model: 0.5 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances

29

Fall 2014 CS105 Floro / Liu / Gan

0.7099
1149.8182
1357.5547
72.9343 %
70.6102 %
585

Exhibit 9: Regression Tree Tree View


M5P; Tree View

Demand Prediction Model for Rental Bicycle Services

30

Exhibit 9: Raw Results from Linear Regression (Supplied Test Set)


Linear Regression; Test Method: Supplied Test Set
=== Run information ===
Scheme:
weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation:
hubway_new-weka.filters.unsupervised.attribute.NumericToNominal-R1-4weka.filters.unsupervised.instance.Randomize-S42weka.filters.unsupervised.instance.RemovePercentage-P20.0
Instances:
585
Attributes:
9
season
mnth
weekday
weathersit
temp
atemp
hum
windspeed
cnt
Test mode:
user supplied test set: size unknown (reading incrementally)
=== Classifier model (full training set) ===

Linear Regression Model


cnt =
1596.993
-702.5904
-417.2612
326.5611
-592.5343
-429.6212
1333.8152
-711.6712
385.7063
1771.9796
308.1646
7413.8362
-2700.4483
-3459.3138
622.8475

*
*
*
*
*
*
*
*
*
*
*
*
*
*

season=4,2,3 +
season=2,3 +
mnth=12,11,4,10,5,8,7,9,6 +
mnth=4,10,5,8,7,9,6 +
mnth=8,7,9,6 +
mnth=7,9,6 +
mnth=9,6 +
mnth=6 +
weekday=2,5,4,6,3 +
weathersit=2,1 +
weathersit=1 +
atemp +
hum +
windspeed +

Time taken to build model: 0.12 seconds


=== Evaluation on test set ===
Time taken to test model on supplied test set: 0.01 seconds
=== Summary ===
Correlation coefficient
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances

31

Fall 2014 CS105 Floro / Liu / Gan

0.7741
1082.3166
1274.6821
67.1989 %
63.5683 %
146

Demand Prediction Model for Rental Bicycle Services

32

You might also like