Professional Documents
Culture Documents
Robert Coyle
X13109278
robert.coyle@student.ncirl.ie
Table of Contents
ABSTRACT ........................................................................................................................................... 6
DEFINITIONS, ACRONYMS, AND ABBREVIATIONS ................................................................ 6
INTRODUCTION ................................................................................................................................. 7
RELATED WORK ................................................................................................................................ 8
SYSTEMS AND DATASETS .............................................................................................................. 8
DESIGN AND ARCHITECTURE ......................................................................................................................... 8
Brief description of work carried out .................................................................................................... 8
DATASETS .......................................................................................................................................................... 8
Gathering of Twitter Data. ......................................................................................................................... 9
Gathering of Stock Price Data ................................................................................................................15
Data Preparation .........................................................................................................................................16
REQUIREMENTS ............................................................................................................................................. 17
Data requirements .......................................................................................................................................17
User requirements ........................................................................................................................................17
Usability requirements...............................................................................................................................17
Functional Requirements .........................................................................................................................17
TESTING AND EVALUATION ........................................................................................................19
SYSTEMS TESTING. ........................................................................................................................................ 19
Apple Stock ......................................................................................................................................................19
Microsoft Stock ..............................................................................................................................................25
Tesla Stock .......................................................................................................................................................33
FORMULA FOR PREDICTING STOCK MOVEMENT ..................................................................................... 36
Formula Used .................................................................................................................................................36
Apple Stock Prediction ...............................................................................................................................36
Microsoft Stock Prediction .......................................................................................................................40
Tesla Stock Prediction ................................................................................................................................43
CONCLUSION .....................................................................................................................................46
FURTHER DEVELOPMENT ...........................................................................................................47
BIBLIOGRAPHY................................................................................................................................48
APPENDIX ..........................................................................................................................................48
Project Materials: .........................................................................................................................................48
PROJECT PROPOSAL ......................................................................................................................49
INTRODUCTION .............................................................................................................................................. 49
BACKGROUND ................................................................................................................................................ 49
TECHNICAL APPROACH ................................................................................................................................ 50
SPECIAL RESOURCES REQUIRED ................................................................................................................. 50
PROJECT PLAN ............................................................................................................................................... 51
TECHNICAL DETAILS .................................................................................................................................... 51
SYSTEMS/DATASETS .................................................................................................................................... 51
EVALUATION/TEST AND ANALYSIS ........................................................................................................... 51
CONSULTATION WITH SPECIALIZATION PERSONS................................................................................... 52
REQUIRMENTS SPECIFICATION .................................................................................................53
RAID LOG:....................................................................................................................................................... 82
Risks ....................................................................................................................................................................82
Assumptions ....................................................................................................................................................83
Issues ..................................................................................................................................................................83
Dependency .....................................................................................................................................................84
PRODUCTS DUE FOR COMPLETION..........................................................................................84
CONCLUSION .....................................................................................................................................85
PROGRESS MANAGEMENT REPORT 3......................................................................................85
DOCUMENT LOCATION ................................................................................................................................. 85
REVISION HISTORY ....................................................................................................................................... 85
APPROVALS .................................................................................................................................................... 85
DISTRIBUTION ............................................................................................................................................... 85
PURPOSE OF DOCUMENT ............................................................................................................................. 86
DATE OF REPORT ........................................................................................................................................... 86
PERIOD COVERED .......................................................................................................................................... 86
SCHEDULE STATUS ........................................................................................................................................ 86
Updated Gantt chart ...................................................................................................................................86
DEFINITIONS, ACRONYMS, AND ABBREVIATIONS ..............................................................86
PRODUCTS COMPLETED DURING THIS PERIOD ..................................................................86
PROBLEMS.........................................................................................................................................87
ACTUAL ........................................................................................................................................................... 87
POTENTIAL ..................................................................................................................................................... 87
RAID LOG:....................................................................................................................................................... 87
Risks ....................................................................................................................................................................87
Assumptions ....................................................................................................................................................88
Issues ..................................................................................................................................................................88
Dependency .....................................................................................................................................................89
PRODUCTS DUE FOR COMPLETION..........................................................................................89
CONCLUSION .....................................................................................................................................89
REFERENCES .....................................................................................................................................90
Abstract
This thesis investigates the possibility of predicting stock market movement
using Twitter activity. The Analysis will use data mining applications, data
analysis techniques, correlation and regression modelling.
The data mining of Twitter feeds was carried out.
The process involved using Twitter API and Java code to search and download
tweets with the words Apple, Microsoft and Tesla in them. These files were then
processed using Amazon web service and Text Wrangler. An analysis was carried
out using software such as R studio and Microsoft excel. Correlation models and
Regression models were built along with the Granger Causality test in R studio.
Visualisation techniques were carried out in Microsoft Excel and R studio
showing some trends in the data.
A formula for stock market prediction for commercial use was created. Since the
data set gathered from Twitter was not large enough and the actual information
in the tweets was not specified towards the stock belonging to the companies,
there is an issue of noisy data corrupting the analysis. A sentiment analysis was
not carried out on the tweets.
Definition
Application programming interface
Amazon Web Service
A form that indicates that a subject causes something else
to do something or causes a change in state of a nonvolition event.
Google Profile of Mood States, algorithm to classify public
sentiment into 6 categories {Calm, Alert, Sure, Vital, Kind
and Happy}
A statistical hypothesis test for predicting if one time series
is useful in predicting another.
National Association of Securities Dealers Automated
Quotations
Meaningless data.
Profile of Mood States.
A natural language processing, text analysis and
computational linguistics to identify and extract subjective
information in source materials.
Text editor for Mac OS X
A message posted on the Twitter website.
Introduction
The stock market is an essential way for companies to raise money.
Companies can raise additional financial capital by being publicly traded in order
to expand their business by selling shares of ownership.
Historically it is known that share prices can have a major influence on economic
activities and can be an indicator of social mood.
The stock market movements has always been a rich and interesting subject with
such many factors to be analysed that for a long time it would be considered
unpredictable.
The application of new computerized mathematical methods over the past few
decades developed by companies such as Merrill Lynch and other financial
management companies have created models that can maximize their returns
while minimizing their risks.
Stock market prediction has been around for years but it has been giving a new
method of prediction thanks to the rise of social media.
The objective of this project is to analyse Twitter feeds for activities and trends
associated with a brand and to see how their stock market shares are related and
if they are affected to the twitter activity.
This analysis will look at the relationship of the amount of tweets for three
specific brands on the NASDAQ, Apple, Microsoft and Tesla. The search for each
companys symbols on the NASDAQ within those returned tweets would be
conducted as an additional exploration of stock conversation on Twitter.
These brands where chosen since they are innovative technology companies that
are on the same stock exchange. Therefore gathering of the twitter data was not
time zone dependent.
Stock market data was collected from the Yahoo Finance website, there they
provide historical data for the NASDAQ.
Java scripts were used to acquire the tweets through Twitters API service.
The Tweets for each brand were then counted using Amazon Web Service and
Text Wrangler.
The counted tweets were subsequently analysed using R studio were
correlational and regression models were built and Granger Causality Test was
performed.
The Data was then visualised in Excel and R studio and the creation of a formula
for commercial use was attempted.
Related Work
In the previous study Stock Market Prediction Using Twitter I researched papers
in relation to sentiment analysis of social media for the prediction of stock
market movement. The social media in question was Twitter.
The investigated looked at the correlation between the public mood and the
stock market movement and how it can be used to predict stock market prices.
The use of sentiment analysis was used to translate the tweets into moods using
algorithms such as Google Profile of Mood States.
The process of using a sentiment analysis on the tweets proved to be an accurate
analysis of the data.
Analysing Twitter activity does not provide sufficient behavioural attitudes
towards the investors and an accurate prediction of stock movement cannot be
ascertained. Sentiment analysis provides the investigation with an insight into
the public attitude. The more detailed sentiment analysis on the Twitter data
along with a reliable stock data the more superior and accurate the results.
Twitter activity along might not give the insight the stockbroker needs to make
challenging decisions in buying or selling shares.
Datasets
There were two forms of datasets.
The first dataset acquired was the Twitter feeds.
Historical tweets proved to be difficult since Twitter had sold on their
information to external parties. These companies, such as DataSift offer analysis
on historical data. While this would have been beneficial to the original project
proposal the budget of the project was zero.
Twitter launched a Historical Data Grant scheme, which allowed academic
students to send in their proposal to gain access to Twitters historical data.
The Use of Twitter Activity as a Stock Market Predictor
A proposal on behalf of this project was sent into the Data Grant scheme but a
reply from Twitter returned far too late into the project.
Subsequently from these dates the historical stock market data was gathered
from Yahoo Finance.
Gathering of Twitter Data.
The Java script was acquired under approval of Dr. Brian Mac Namee, a Principal
Investigator with CeADAR and a lecturer in the School of Computing at the
Dublin Institute of Technology.
The Java script was used in conjunction with Twitter API.
In order to use the Twitter API user must first sign up for a developer account
and create an application; there the user can acquire the API codes/keys to run
their script.
The script was run on my behalf at a friends home since my own personal
Internet connection was not suitable and the apprehension of disconnection,
which would have returned unreliable time series.
Figure 1.2: Example of the JAVA code used for downloading the twitter feeds.
Figure 1.3: Demonstrates where the unique keys were inputted into the JAVA
script.
Figure 1.4: Demonstrates where the key words were inputted into the JAVA
script.
10
Figure 1.5: Example of the acquired twitter feeds from the JAVA script in a text
file.
Since one of the days the script was running stopped there was a gap of which
existed no tweets from 3am until 8am one day because of this tweets that were
published between the trading times of the NASDAQ were used.
NASDAQ trading hours is from 09:30 until 16:00 Monday to Friday.
In GMT time that is 14:30 to 21:00.
Counting the Tweets
Next the tweets had to be counted.
To this I initially proposed using Amazon Web Services because of the size of the
data sets. A word count from the AWS website was used to count all the specific
words in each tweet.
11
Figure 1.6: Example of the acquired Python script file from the AWS website.
(Aws.amazon.com, 2014)
A folder in the S3 bucket was created named project 2014.
Here all necessary files such as python scripts and tweet files were uploaded.
An Elastic Map Reduce Cluster was created.
12
13
Figure 1.9: Example of a tweet with Apple mentioned twice in Text Wrangler.
(Mac App Store, 2014)
What was needed was a way to count the amount of tweets that had the keyword
mentioned in them. These tweets could contain all three keywords (Apple,
Microsoft and Tesla) or together the twitter feeds of each word separately.
Text Wrangler was used to search the individual text files for the frequency of
the tweets with the key words separately but still had the same problem of
counting the amount of times the word occurred.
Figure 1.10: Example of tweets from Monday with Tesla mentioned, 3866
occurrences in Text Wrangler. (Mac App Store, 2014)
For this reason there will be some conflicts in my analysis result because of extra
word counts in tweets with the keywords mentioned twice.
Date
Apple
AAPL
Microsoft MSFT
Tesla
07/04/2014
71913
1001
36417
521
08/04/2014
118077
950
47925
613
09/04/2014
81840
1100
24084
437
10/04/2014
63983
1483
19521
435
11/04/2014
62755
1145
18146
343
Figure 1.11: Displays the key words and their occurrences per day.
TSLA
3866
4600
3113
3204
2140
281
395
301
447
347
The Original Key words were Apple, Microsoft and Tesla. I decide to also search
for their NASDAQ symbol/code. From previous research into twitter mining and
stock prediction researchers searched for the company codes, as it would return
14
more accurate tweet count where people were tweeting about the actual stock of
the company.
Gathering of Stock Price Data
Once the twitter feeds had being gathered the financial data could be
downloaded. The historical stock prices had to be the same dates as the Twitter
feeds. The data was downloaded in excel format then saved as a CSV file for use
in R for analysis.
Historical data sets of stock prices can only obtained per day at the minimum
from Yahoo Finance otherwise it would have to be streamed from directly from
the NASDAQ website, which I did not have the access to.
Ideally hourly stock prices would have worked by matching the time series with
the Twitter feeds.
Data sets of stock prices were collected from the Yahoo Finance website for all
three companies.
Each set had seven columns consisting of Date, Open, High, Low, Close, Volume
and Adjusted Close.
Date is the day of trading.
Open is the opening price of the stock at the start of the days trading.
High is the highest price of the stock form that day.
Low is the lowest price of the stock from that day.
Close is the closing price of the stock at the end of the days trading.
Volume the number of shares traded that day.
Adjusted Close is the after trading hours price. The difference between
the open and close price.
15
Figure 1.6: Demonstrates the acquired historical Apple stock prices for the
month of April 2014 form the Yahoo Finance website. (Finance.yahoo.com, 2014)
The closing price is the data in which this analysis focoused on.
Data Preparation
Results from the cleaned Twitter data were placed with the financial cleaned
data in excel.
Date
Open
High
Low
Close
Volume
Adj
Close
516.72
Apple
2014519
522.83 517.14 519.61 9704200
62755
04-11
2014530.68 532.24 523.17 523.48 8559000
520.57 63983
04-10
2014522.64 530.49 522.02 530.32 7363200
527.37 81840
04-09
2014525.19 526.12 518.7
523.44 8710300
520.53 118077
04-08
2014528.02 530.9
521.89 523.47 10351800 520.56 71913
04-07
Figure 4.2: Displays the key words and their occurrences per day with the stock
prices for Apple.
This was repeated for all three companies.
16
AAPL
1145
1483
1100
950
1001
Requirements
The requirements have remained mostly the same from the original
Requirements Specification except for the use of live data rather than using
historical Twitter data. Historical Twitter proved to be impracticable as the
project had no budget and the historical data had to be purchased.
Data requirements
DR#
Category
Description
Mo
sco
w
DR1
DR2
Use of
Infromation
Availability
S
t
a
t
u
s
M
DR3
Access
User requirements
UR#
Category
Description
Mo
sco
w
UR1
Analysis
outcome
S
t
a
t
u
s
M
UR2
User outcome
S
t
a
t
u
s
H
Usability requirements
Functional Requirements
FR#
Category
Description
Mo
sco
w
FR1
Aquire Data 1
The project will gather and store all nessary data from live
Twitter feeds using JAVA scripts in conjunction with Twitter
17
FR2
Aquire Data 2
FR3
Clean Data 2
FR4
Clean Data 2
FR5
FR6
Analyse 1
Analyse 2
FR7
Publish Data
API.
The project will gather and store all nessary historical stock
mrket data regarding the brand corrosponding to the dates
in relation to the Twitter data that was aquired from the
Yahoo Finance website.
The correct programs will be aquired and used to clean and
retrive Twitter data regarding to key words and hash tags of
the brand on certain dates.
The correct programs will be aquired and used to clean and
retrive data historcal stock market share prices regarding
the brand on the same time series as the Twitter feeds data.
The cleaned Twitter data is then analysed and compared.
The cleaned stock market data is then analysed and
compared.
The analyse will then be publised and avslible to the
coustomer.
18
M
M
H
H
19
20
Figure 4.7 Displays Granger Causality Test output in R for Closing price and
Apple word count.
From the result above you can see that after one-day lag are P value is 0.7057.
The Use of Twitter Activity as a Stock Market Predictor
21
This is more than the significance level of 5%. Therefore the rejection of the Null
hypothesis cannot happen meaning Apple word count does not predict the
closing price one day later.
Figure 4.8 Displays Granger Causality Test output in R Closing price and AAPL
word count.
A similar test was performed use the keyword AAPL as the independent and
Close as the dependent. Results were slight better but did not cause Granger
Causality. P value of 24% >5%.
Since the data set was small a lag of 2 days could not be performed.
The above image demonstrates the unsuccessful outputs of the Granger Causality
test using more than 1 days lag. The reason for this error is because the data set
was too small.
22
4. Visualization.
Figure 4.1.1 demonstrates the relationship between the Apple count and Close price.
From the above graph it is possible to see the positive relationship that the
keyword Apple has with the Close price of Apple stock. As the Apple Count rises
there is a rise in the closing stock price.
Figure 4.1.2 demonstrates the relationship between the AAPL count and Close price.
23
From the above graph it is possible to see the negative relationship that the
keyword AAPL has with the Close price of Apple stock. As the AAPL Count rises
there is a decline in the closing stock price. This proves are negative results from
the correlation and regression models. AAPL was not a key word in the JAVA
script but a search within the key word apple.
140000
530
120000
100000
526
524
80000
522
60000
520
40000
518
20000
516
514
0
2014-04-07
2014-04-08
2014-04-09
Close
2014-04-10
2014-04-11
Apple
Figure 4.1.3 demonstrates the relationship between the Apple count and Close price.
As you can see from the above chart the Close Price marked line follows a similar
trend about a day later to the Apple count line.
24
Apple Count
Close Price
528
1600
530
1400
528
Close Price
526
1000
524
800
522
600
520
400
518
200
516
514
2014-04-07
2014-04-08
2014-04-09
Close
2014-04-10
2014-04-11
AAPL
Figure 4.1.4 demonstrates the relationship between the AAPL count and Close price.
Unfortunately the above chart shows that the Close price didnt show a similar
trend with AAPL but it actually showed a trend where AAPL word count is
following the Close Price.
This is probably the reason the correlation model was so low between the two;
also the investor community that would use the keyword AAPL (Apple stock
symbol) are disusing the rise in Apple stock.
Microsoft Stock
The process was started again this time using the Microsoft data set.
1. Check for correlation
Figure 4.1.5 demonstrates the correlation between Microsoft and MSFT word count and
Close price.
The correlation model this time is much better with both keywords retuning a
moderate correlation with Close price.
25
AAPL Count
1200
2. Regression Model
Figure 4.1.6 displays the regression model with Microsoft word count as the
independent variable.
Figure 4.1.7 displays the regression model with MSFT word count as the independent
variable.
Figure 4.1.6 and 4.1.7 demonstration the two regression outputs from R as Close
stock price as the dependent variable.
Figure 4.1.6 displays a Multiple R-squared value of 0.96% explaining Close price.
Figure 4.1.7 displays a Multiple R-squared value of 12.6% explaining Close price.
26
Figure 4.1.8 demonstrates Normality plot of Microsoft and Close price. Normality
condition is met.
Figure 4.1.9 demonstrates Normality plot of MSFT and Close price. Normality condition
is met.
27
Again the Granger Causality would not use a lag bigger tan one day. Both
returned values bigger than the significant level of 5%.
4. Visualization
Figure 4.2.2 demonstrates the relationship between the Microsoft count and Close price.
28
Figure 4.2.3 demonstrates the relationship between the MSFT count and Close price.
40.6
40.4
40.2
40
39.8
39.6
39.4
39.2
39
38.8
38.6
38.4
60000
50000
40000
30000
20000
10000
0
4/7/14
4/8/14
4/9/14
Close
4/10/14
4/11/14
Microsoft
Figure 4.2.4 demonstrates the relationship between the Microsoft count and Close price
on a line chart.
As you can see from the above chart the Close Price marked line follows a similar
trend about a day later to the Microsoft count line.
29
Microsoft count
Close price
700
Close price
500
40
400
39.5
300
200
39
MSFT count
600
40.5
100
38.5
0
4/7/14
4/8/14
4/9/14
Close
4/10/14
4/11/14
MSFT
Figure 4.2.5 demonstrates the relationship between the MSFT count and Close price on a
line chart.
40.6
40.4
40.2
40
39.8
39.6
39.4
39.2
39
38.8
38.6
38.4
60000
50000
40000
30000
20000
10000
0
4/8/14
4/9/14
Close
4/10/14
4/11/14
Microsoft
Figure 4.2.6 demonstrates the relationship between the Microsoft count and Close price
on a line chart with a one-day lag.
30
Microsoft count
Close price
40.6
40.4
40.2
40
39.8
39.6
39.4
39.2
39
38.8
38.6
38.4
700
600
500
400
300
200
100
0
4/8/14
4/9/14
4/10/14
Close
4/11/14
MSFT
Figure 4.2.7 demonstrates the relationship between the MSFT count and Close price on a
line chart with a one-day lag.
The decision was made to perform a manual lag in excel by moving the dates of
the Microsoft count forward to see if the lines in the chart match up.
This lag would mean that the tweet counts about Microsoft happened on the
same dates as the actual Closing price.
The results from the two graphs show that visually there is a relationship
between the word counts and the Close stock price.
A correlation and regression model was built again using the lagged data.
1. Correlation
Figure 4.2.8 demonstrates the correlation between Microsoft and MSFT word count and
Close price with a lag of one day.
The correlation model in figure 4.2.8 shown a strong correlation with the two
word counts. So a regression model was produced.
31
MSFT count
Close price
2. Regression Model
Figure 4.2.9 displays the regression model with Microsoft word count as the
independent variable using data with a one-day lag.
32
Figure 4.3.1 displays the regression model with MSFT word count as the independent
variable with data of one-day lag.
Tesla Stock
The process was started again this time using the Tesla data set.
Correlation and regression was performed with similar results from the pervious
data sets.
Figure 4.3.2 demonstrates the correlation between Microsoft and MSFT word count and
Close price.
Figure 4.3.2 demonstrates the correlation between Microsoft and MSFT word count and
Close price with a one-day lag.
The keyword Tesla showed a strong correlation with the Tesla closing stock
price from the lagged data set. TSLA still displayed a moderate correlation.
33
Figure 4.3.3 displays the regression model with Tesla word count as the independent
variable using data with a one-day lag.
Again the regression with the lagged data set showed a huge improvement then
the non-lagged Tesla data.
5000
4500
215
4000
210
3000
2500
205
2000
1500
200
1000
500
195
0
4/7/14
4/8/14
4/9/14
Close
4/10/14
4/11/14
Tesla
Figure 4.3.4 demonstrates the relationship between the Tesla word count and Close
price on a line chart.
34
Tesla count
Close Price
3500
5000
4500
215
4000
210
3000
2500
205
2000
1500
200
1000
500
195
0
4/8/14
4/9/14
4/10/14
Close
4/11/14
Tesla
Figure 4.3.5 demonstrates the relationship between the Tesla word count and Close
price on a line chart with a one-day lag.
Figures 4.3.4 and 4.3.5 demonstrate the difference between the non-lagged and
the lagged data sets. Figure 4.3.5 demonstrates that the one-day in lag does make
a difference to the results. It demonstrates a close relationship the Tesla count
has with the Close price.
35
Tesla Count
Close Price
3500
Difference in
Apple Stock
Day one
Day Two
Day Three
Day Four
%
0.019568162
0.005%
0.013143818
1.31%
-0.012897873
1.29%
-0.007392833
0.73%
1.96%
0.279089758
27.91%
0.442778592
44.28%
-0.390965218
39.09%
Day One
Day Two
Day Three
Day Four
Figure 4.3.6 demonstrates difference in Stock Close price and Tweet activity between
days.
36
If the movement were not identical in percentage increase/ decrease then the
formula would need to be adjusted. The movement in Tweet Activity was not
proportionate (pro rata movement).
Figure 4.3.7 demonstrates the formula for predicting the third day using Close stock
values.
Multiply it by 1.9568%
This projects an increase of $10.29
37
Figure 4.3.8 demonstrates the formula for predicting the forth day using Close stock
values.
The process was repeated this time using values to predict the fourth day.
Unfortunately an error of 27.904% was returned.
Figure 4.3.9 demonstrates the formula for predicting the fifth day using Close stock
values.
The process was repeated this time using values to predict the fifth day.
Unfortunately an error of 47.25% was returned. The formula didnt apply to the
days after the third.
38
Figure 4.4.1demonstrates the formula for predicting the third forth and fifth day using
Low Stock values.
Also considered was the formula used with the Low stock price to see if there
was a relation.
The best day the formula applied to was predicting the third day with an error of
1.89%.
39
Figure 4.4.2 demonstrates the formula for predicting the third day using the volume
values.
Difference in Stock
Day one
Day Two
Day Three
Day Four
Day One
Day Two
Day Three
Day Four
0.316006261
31.60%
-0.497464789
49.74%
-0.189461883
18.94%
-0.070436965
7.04%
Figure 4.4.3 demonstrates difference in Stock Close price and Tweet activity between
days.
40
Figure 4.4.4 demonstrates the formula for predicting the third forth and fifth day using
the Close stock values.
41
11508
12.5580888
Stock low price day 1 + low stock of day 1 * difference of tweets day1 and day 2
52.2980888
-12.5580888
0.237448234
23.74%
Figure 4.4.7 demonstrates the formula for predicting the third day using the Low stock
values.
Again the formula showed that it did not apply to the Low Stock price.
Calculate the percentage difference of Microsoft Tweets And Volume
Figure 4.4.7 demonstrates the formula for predicting the third day using the Volume
values.
The Volume data was placed into the formula but the result shown above has a
high error rate of 44.5%.
42
Difference in Stock
Difference in Tweet
Activity
Day one
0.002007934
0.200793379 Day One
0.189860321
18.98603207
Day Two
0.027922269
2.792226911 Day Two
-0.32326087
32.32608696
Day Three
-0.02110152
2.110151951 Day Three
0.029232252
2.923225185
Day Four
0.026816564
2.681656439 Day Four
0.332084894
33.20848939
Figure 4.4.8 demonstrates difference in Stock Close price and Tweet activity between
days.
43
Figure 4.4.9 demonstrates the formula for predicting the third forth and fifth day using
the Close stock values.
The formula had high percentage errors except for the prediction for the fifth
day with an error of 2.33%.
44
38.69163476
Stock low price day 1 + low stock of day 1 * difference of tweets day1 and day
2
242.4816348
48.07163476
0.198248559
19.82485594
Figure 4.5.1 demonstrates the formula for predicting the day using the Low stock values.
-734
1369177.703
8580677.703
877677.7031
0.102285359
10.22853594
Figure 4.4.9 demonstrates the formula for predicting the third day using the Volume
values.
When the Low Stock and Volume values were placed into the formula they also
displayed high errors. Low Stock had an error of over 19% and the Volume
values had an error over 10%.
45
Conclusion
This analysis investigated the relation between twitter activity and stock market
share prices of three companies in the NASDAQ over a period of one week. The
use of a Java script and Twitters API collected the tweets that had the keywords
Apple, Microsoft and Tesla mentioned in them. Once the tweets were collected a
python file was used to count the frequency of words in conjunction with
Amazon Web Service. AWS was used because of the size of the Tweets files,
which were in text format of sizes ranging from 60 to 130 megabytes.
Text Wrangler was also used to count the frequency of tweets with the
keywords. Since one of the data sets have missing data over five hours due to a
program failure it was decided to use tweets during the NASDAQ trading hours.
Stock data belonging to the three companies was acquired from the Yahoo
Finance website.
Similarly a count of times the NASDAQ symbols for each company was conducted
and used as an additional analysis. The symbols would give the opportunity to
investigate the occurrence of conversations directed to the actual company stock
on the NASDAQ.
Analysis was performed in R studio using a correlation model first to see the how
strong a relation the tweet data had with the stock data of each company.
A Linear regression algorithm was then used to see the effect that the twitter
data had on the stock data.
Granger Causality was performed to discover if one of the time series affected
the other providing a result in the form of a lag per day. Since the data was so
small a lag of only one-day could be performed providing a significant level of
over 5%, which we could not select, the alternative hypothesis.
During visualization of the data using line graphs it was noted that there seem to
be a relation where the stock data had a similar trend one day after the tweet
data. A manual lag was performed in excel by moving the tweet data time series
forward by one day. This proved that a trend did exist. Subsequently a
correlation model in R studio was created and the results exhibit a strong
correlation of 0.9 and over.
The creation of a formula for commercial use was attempted. The first formula
was used to find the percentage difference between the stock movement and the
tweet movement. On average there was a difference between the movement of
the stocks and the shares.
Another formula was created to predict the close share price. Knowing the
twitter volumes of a company for two consecutive days, the percentage of
movement of tweets between those two days should in turn allow us to predict
the movement in the company share price three days later.
The formula used is a straight line (1:1 ratio)
Whilst predicting the third day for the Apple share prices an error level of just
0.639% was returned.
This meant that the close share price increased at the same rate as the Twitter
feeds for the key word Apple. Within an error lever of 0.639%
Disappointingly the other days predicted for Apple Close stock price were not as
suitable returning error rates of 27.9% and 47.25%. This trend continued
throughout the analysis for the closing price in the Microsoft and Tesla stock.
The formula was slightly altered to accommodate the use of other variables such
as Low Close stock and Volume. Again the errors were high for each one.
The Use of Twitter Activity as a Stock Market Predictor
46
The main issue here is that the data set is not developed enough to do this form
of analysis. When acquiring the data specific tweets regarding the stock of the
company should have only being collected. A company on Twitter is competing
for public interest while the stock exchange is competing for capital interest. In
that aspect some of the Tweets gathered in this analysis are noisy data.
Further Development
Further develop in the project would include extracting tweets and stock
data over a longer period of time. This would have provided the analysis
with a superior result from the Granger Causality test.
The tweets need to be selected form a niche community, preferably the
investor community who communicate through Twitter in relation to the
stocks of companies. Tweets that have the company symbols and the
word stock mentioned in them should be gathered using those
keywords.
Narrowing down the selection of companies and focusing on one would
support in reducing the amount of discrepancies in the tweet count.
Developing a program script to count the lines that a word appears in
without recounting the word again if it has being mentioned more than
once in a tweet.
The potential use of developing a formula that could take account of other
variables that would cause movement in stock, such as events like the
release of company financial reports, takeover rumours, mergers or bad
publicity.
The process of using a sentiment analysis on the tweets would provide a
more accurate result from the data. Analysing Twitter data activity along
will not provide the analysis with any information about behavioural
attitudes towards the investors.
Sentiment analysis would also provide a better insight into the public
attitude.
47
Bibliography
Aws.amazon.com, (2014). Word Count Example : Articles & Tutorials : Amazon
Web Services. [online] Available at: http://aws.amazon.com/articles/2273
(Accessed 22 May. 2014).
Bollen, J. and Mao, H. (2011) 'Twitter mood as a stock market predictor'
Computer.
Datasift.com, (2014). Power Decisions With Social Data | DataSift. [online]
Available at: http://datasift.com (Accessed 24 May. 2014).
Dev.twitter.com, (2014). Twitter Developers. [online] Available at:
https://dev.twitter.com (Accessed 22 May. 2014).
Finance.yahoo.com, (2014). AAPL Historical Prices | Apple Inc. Stock - Yahoo!
Finance. [online] Available at:
http://finance.yahoo.com/q/hp?s=AAPL&a=03&b=01&c=2014&d=03&e=30&f=
2014&g=d (Accessed 22 May. 2014).
Mac App Store, (2014). TextWrangler. [online] Available at:
https://itunes.apple.com/ie/app/textwrangler/id404010395?mt=12 (Accessed
22 May. 2014).
Mittal, A. and Goel, A. (2012) 'Stock prediction using Twitter sentiment analysis'
Standford University, CS229(2011 http://cs229. stanford.
edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.
pdf).
Simsek, M. and Ozdemir, S. (2012) 'Analysis of the relation between Turkish
twitter messages and stock market index'.
Ucd.ie, (2014). CeADAR. [online] Available at: http://www.ucd.ie/ceadar/
(Accessed 26 May. 2014).
Ucd.ie, (2014). Brian Mac Namee | CeADAR. [online] Available at:
http://www.ucd.ie/ceadar/people/principalinvestigators/brianmacnamee/
(Accessed 26 May. 2014).
Appendix
Project Materials:
https://drive.google.com/folderview?id=0B4pkBIaL1W7CQzVVakgwQ3psNFk&
usp=sharingReferences
48
Project Proposal
Introduction
The purpose of this project is to study and analyse the activities and trends
associated to the Mobile World Congress 2014, which is being held from the 24th
to the 27th of February 2014.
The Mobile World Congress is the worlds largest exhibition of the mobile
industry. Mobile operators, device manufacturers and technology providers are
all represented at the exhibition.
With a large amount of manufacturers attending and product launches the
subject can be quite broad.
The objective of this project is to analyse Twitter feeds for activitys and trends
associated with the top mobile manufacturers before, during and after the event
and to see how their stock market shares are connected and affected by the
Twitter feeds.
Background
As Twitter matures, top brands have realized just how relevant Twitter can be as
a marketing and engagement platform.
According to Useful Social Media 98% of the top brands are on Twitter and 92%
of top brands tweet daily. There are 230 million active users on Twitter; this
provides brands with a global presence. (USM) 92% of top brands Tweet at
least once daily as audiences grow. Study shows Twitters maturity as a
marketing and engagement platform. 98% of all top brands are active on Twitter.
The social network has matured into a valuable and necessary channel for
marketing organizations. (Usefulsocialmedia.com, 2014)i
Releases such as the Samsung Galaxy s5 will hopefully see a surge of Twitter
activity in relation to Samsung during the event. According to Trusted Reviews
the release of the Samsung Galaxy s5 will take place during the event. (Trusted
Reviews) The Samsung Galaxy S5 release date looks set to be held in a matter of
days as the Korean manufacturer issues invites to a February 24 launch event,
kicking Samsung Galaxy S5 rumours into overdrive.(Trusted Reviews, 2014)ii
Using the data from the Twitter feeds I can then analyse them against the stock
market shares.
According to Mac Rumours, Samsung has the biggest phone market share with
Apple in second place. (Mac Rumours) Apple Continues to Lose Smartphone
Share, Gain Mobile Phone Share in 4Q 2013 (Mac Rumours, 2014)iii
49
Similar research has being done in relation to Twitter feeds influencing market
shares but this project will be focusing mainly on the Mobile World Congress in
relation to the markets shares of the top five mobile device manufacturers.
Technical Approach
This objective will be achieved by:
Creating the necessary python coding to use with the Twitter API for
retrieving the data.
Gathering all data created on Twitter related to the mobile device brands
before, during and after the event.
Gather stock market share prices before, during and after the event of the
mobile device brands.
Clean all data gathered for analysis
Analysis of the data gathered of Twitter activity against the stock market
share prices.
Return the results of the analysis.
Twitter API: Up and Running: Learn How to Build Applications with the
Twitter API Paperback by Kevin Makice. (2009)
Software to be used:
Python
R studio
MYSQL
Microsoft Excel
Microsoft Project
Twitter API
50
Project Plan
Technical Details
The coding I will use to retrieve the data will be python.
R coding and Microsoft Excel will then be used to do the analysis of the data.
Systems/Datasets
The datasets used will be all collected by myself using the online Twitter API
with the python coding to collect specific words, hash tags from the tweets over
the duration of the events operating time per day.
Classification
Regression (value estimation)
Similarity matching
Clustering
The Use of Twitter Activity as a Stock Market Predictor
51
52
Requirments Specification
Document Control
Revision History
Date
20/02/2014
23/02/2014
24/02/2014
Version
1
2
3
Scope of Activity
Create
Update
Update
Prepared
RC
RC
RC
Reviewed
X
X
X
Approved
X
X
X
Distribution List
Name
Title
Oisin Creaner
Samsung
Robert Coyle
Robert Coyle
Robert Coyle
Robert Coyle
Robert Coyle
Lecturer
Customer
BA
System Developer
Statistician
Tester
Advertising and Marketing Devision
Version
Related Documents
Title
Proposal Document
Comments
53
1 Introduction
1.1 Purpose
The purpose of this project is to study and analyze the activities and trends
associated to a brands advertising campaign. The objective of this project is to
analyze Twitter feeds for activities and trends associated with the brand before,
during and after their advertising campaign and to see how their stock market
shares are connected and affected by the Twitter feeds.
The intended customers are the actual brands, their marketing and PR team.
As Twitter matures, top brands have realized just how relevant Twitter can be as
a marketing and engagement platform.
According to Useful Social Media 98% of the top brands are on Twitter and 92%
of top brands tweet daily. There are 230 million active users on Twitter; this
provides brands with a global presence. (USM) 92% of top brands Tweet at
least once daily as audiences grow. Study shows Twitters maturity as a
marketing and engagement platform. 98% of all top brands are active on Twitter.
The social network has matured into a valuable and necessary channel for
marketing organizations. (Usefulsocialmedia.com, 2014)v
54
The project will not provide Samsung with outside analysis of other
brands data.
Definition
A series of messages to promote a product.
Backed-up
Cloud
Data
Excel
GUI
Moscow
Business Analyst
Pyton
Programming Langauge
55
campaign more effectively and efficiently by directing the style and approach of
the campaign towards their specific products.
3 Requirements Specification
3.1 Functional Requirements
FR#
Category
Description
Mo
sco
w
FR1
Aquire Data 1
FR2
Aquire Data 2
FR3
Clean Data 2
FR4
Clean Data 2
FR5
FR6
Analyse 1
Analyse 2
M
M
H
H
FR7
Publish Data
The project will gather and store all nessary data from
historical Twitter feeds.
The project will gather and store all nessary historical stock
mrket data regarding the brand corrosponding to the dates
in relation to the Twitter data that was aquired.
The correct programs will be aquired and used to clean and
retrive histoical Twitter data regarding to key words and
hash tags of the brand on certain dates.
The correct programs will be aquired and used to clean and
retrive data historcal stock market share prices regarding
the brand on the same time and dates as the histoical Twitter
feeds data.
The cleaned Twitter data is then analysed and compared.
The cleaned stock market data is then analysed and
compared.
The analyse will then be publised and avslible to the
coustomer.
S
t
a
t
u
s
H
56
57
Flow Description
Precondition
The Data must be online. The data system must be operational at all times.
58
Activation
Use case is activated when the programmer connects to the system online.
Main Flow
1. Step: 1A. Programmer and System Developer source data.
2. Step: 2A. Programmer and Business Analyst validate data with the
Customer.
3. Step: 3A. Programmer accesses the data.
4. Step: 4A. Programmer notifies data availability to the System
Developer.
5. Step: 5A. Programmer downloads data for cleaning.
Alternate Flow
1. Step: 1A. Programmer and System Developer source data.
2. Step: 2A. Programmer and Business Analyst validate data with the
Customer.
3. Step: 2A. Customer does not validate data. Step 1A is set to
recommence.
4. Step: 1A. Programmer and System Developer source data.
5. Step: 2A. Programmer and Business Analyst validate data with the
Customer.
6. Step: 3A. Programmer accesses the data.
7. Step: 4A. Programmer notifies data availability to the System
Developer.
8. Step: 5A. Programmer downloads data for cleaning.
Exceptional Flow
1. Step: 1A. Programmer and System Developer source data.
2. Step: 2A. Programmer and Business Analyst validate data with the
Customer.
3. Step: 2A. Customer does not validate data. Data is unavailable.
4. Use case ends
Termination
The system has gathered all necessary data. The data is then exported on the
cloud storage system. This process has now being terminated.
Post Condition
All Data gathered, move onto the next step.
59
60
Flow Description
Precondition
The Data must be stored and available for cleaning at all times.
Activation
Use case is activated when the programmer connects to the cloud storage system
and retrieves the data.
Main Flow
1. Step: 1B. Programmer and System Developer retrieve data from the
cloud storage system.
2. Step: 2B. Programmer and Tester identify errors in the data set.
3. Step: 3B. Programmer receives recommendations from System
Developer.
The Use of Twitter Activity as a Stock Market Predictor
61
4. Step: 4B. Programmer with the help of the Tester fixes errors and
notifies the System Developer.
5. Step: 5B. Programmer exports the data for analysis.
Alternate Flow
1. Step: 1B. Programmer and System Developer retrieve data from the
cloud storage system.
2. Step: 2B. Programmer and Tester identify errors in the data set.
3. Step: 3B. Programmer receives recommendations from System
Developer.
4. Step: 4B. Programmer with the help of the Tester fixes errors and
notifies the System Developer.
5. Step: 2B. Programmer and Tester test system again and identify more
errors in the data set.
6. Step: 3B. Programmer receives recommendations from System
Developer.
7. Step: 4B. Programmer with the help of the Tester fixes errors and
notifies the System Developer.
8. Step: 5B. Programmer exports the data for analysis.
Exceptional Flow
1. Step: 1B. Programmer and System Developer retrieve data from the
cloud storage system.
2. Step: 2B. Programmer and Tester identify errors in the data set.
3. Step: 3B. Programmer receives recommendations from System
Developer.
4. Step: 4B. Programmer with the help of the Tester fixes cannot fix
errors. Data is corrupt.
5. Use case ends.
Termination
The system cleaned all acquired data. The data is then saved onto the cloud
storage system and exported for analysis. This process has now being
terminated.
Post Condition
All data cleaned, move onto the next step.
62
63
Flow Description
Precondition
The Data must be available for analysis at all times.
Activation
Use case is activated when the BA and the Statistician connects to the cloud
storage system and retrieves the data.
Main Flow
1. Step: 1C. BA and Statistician retrieve data from the cloud storage
system.
64
2.
3.
4.
5.
Step: 2C. The Statistician and BA explore and understand the data set.
Step: 3C. Statistician begins the calculations.
Step: 4C. Statistician and BA began to visualize the data.
Step: 5C. Programmer backs up and stores findings with the approval
of the BA.
Alternate Flow
1. Step: 1C. BA and Statistician retrieve data from the cloud storage
system.
2. Step: 2C. The Statistician and BA explore and understand the data set.
3. Step: 3C. Statistician begins the calculations.
4. Step: 4C. Statistician and BA began to visualize the data. Ba requests
the data to be recalculated with a different approach.
5. Step: 3C. Statistician begins the new calculations.
6. Step: 4C. Statistician and BA began to visualize the data.
7. Step: 5C. Programmer backs up and stores findings with the approval
of the BA.
Exceptional Flow
1. Step: 1C. BA and Statistician retrieve data from the cloud storage
system.
2. Step: 2C. The Statistician and BA explore and understand the data set.
Statistician and BA are unable to understand the data set. Ba requests
new data set.
3. Use case ends
Termination
The analysis is completed. The data is then saved onto the cloud storage system
and exported for Publishing. This process has now being terminated.
Post Condition
All data analyzed, move onto the next step.
65
66
Flow Description
Precondition
The Data must be available for analysis at all times.
Customer/Client must be available for analysis at all times.
Activation
Use case is activated when the findings are present to BA, Customer and
Advertising/Publication Division and all three are engaged in communication.
Main Flow
1. Step: 1D. BA, Customer and Advertising/Publication Division retrieve
analysis findings. Findings have acquired owners approval.
2. Step: 2D. BA and Customer discuss the objective of the findings
release.
3. Step: 3D. BA and Customer began to agree on the target audience/data
consumer.
4. Step: 4D. Customer decides the medium type/the style and method of
publicizing the data e.g. websites, newspaper, with the BAs approval
and the assistance of the Advertising/Publication Division.
5. Step: 5D. BA notifies Advertising/Publication Division to publish the
data.
Alternate Flow
1. Step: 1D. BA, Customer and Advertising/Publication Division retrieve
analysis findings. Findings have acquired owners approval.
2. Step: 2D. BA and Customer discuss the objective of the findings
release.
3. Step: 3D. BA and Customer began to agree on the target audience/data
consumer.
4. Step: 4D. Customer decides the medium type/the style and method of
publicizing the data e.g. websites, newspaper, with the BAs approval
and the assistance of the Advertising/Publication Division. Customer
decides to recommence Step: 3D. Again to change the publication
approach.
5. Step: 3D. BA and Customer began to agree on a new target
audience/data consumer
6. Step: 4D. Customer decides the medium type/the style and method of
publicizing the data e.g. websites, newspaper, with the BAs approval
and the assistance of the Advertising/Publication Division.
7. Step: 5D. BA notifies Advertising/Publication Division to publish the
data.
The Use of Twitter Activity as a Stock Market Predictor
67
Exceptional Flow
1. Step: 1D. BA, Customer and Advertising/Publication Division retrieve
analysis findings. Findings have not acquired owners approval.
Customer decides not to publicize the data findings due to the high
importance and confidentiality of the findings.
2. Use case ends
Termination
The publication of the data is completed. This process has now being terminated.
Post Condition
All data publicize, all steps completed.
68
5 Interface Requirements
5.1 GUI
An example of a analysis of tweets.
vii
comprendia. 2014
69
viii
powerpivotblog. 2013
70
ix
evolutionanalytics. 2013
skilledup. 2013
71
xi
datamachines. 2012
6 Analysis Evolution
The analysis will evolve over time to produce a much more focused outcome,
differencing itself by the analysis of a specific product in the Samsung product
range. This can occur by changing the mining of keys words in the twitter data,
focusing on a product such as the Galaxy products in the Samsung range. These
include the smartphone, Tablet and Watch.
If the customer Samsung required an analysis to focus on the release of a
specific product such as the Galaxy S4 which was released April 2013 this can be
done by narrowing down the search key word, using hash tags and words such
as (#samsungS4, #SamsungGalaxyS4, #GalaxyS4 #S4) and narrowing down the
time lines to the release date of the phone.
72
Revision History
Date of this revision: 9/03/14
Revision
date
Prevision
revision
date
9/03/14
Summary of changes
Changes
marked
First Issue
Approvals
This project requires the following approvals.
Name
Robert Coyle
Signature
Title
Project
Manager
Date of issue
10/03/14
Version
1
Distribution
Name
Oisin Creaner
Title
Project Lecturer
Date of issue
10/03/14
Version
1
73
Purpose of Document
Is to provide Oisin Creaner the project lecturer with a summary of the status of
the project.
Date of report
09/03/14
Period covered
10/02/14 9/03/14
Schedule Status
This project is still on schedule at this interval.
Updated Gantt chart
03-Feb
Project Proposal
23-Feb
15-Mar
1
5
24-Apr
1
3
04-Apr
25
8
20
Definition
Application programming interface
JavaScript Object Notation
American Stock Exchange
Rich Site Summary
74
Problems
Actual
Accessing Twitter API
Potential
The quality and quantity of
the twitter data.
75
Raid Log:
Risks
76
Assumptions
Issues
Dependency
77
Conclusion
This project, even with the set backs is still capable of finishing within the
original set target dates. Gathering all the data in the next week is paramount for
the success of the project. Any more delays will compromise the quality of the
project.
Currently I am waiting on a response from Twitter in relation with their
Developers grant scheme. If this is approved all the historic data from January
2013 to March 2014 will be available and can be gathered using JSON coding
language, See Dependences Ref: D02.
All necessary information has being submitted to the Twitter Developer Grant
scheme such as dates, key words and hash tags.
Alternatives:
If this grant is not approved the project can revert back to streaming the
data live form Twitter using JSON language.
If the grant approval takes to long the project can revert back to
streaming the data live form Twitter using JSON language.
78
Revision History
Date of this revision: 30/03/14
Revision
date
Prevision
revision
date
30/03/14
Summary of changes
Changes
marked
First Issue
Approvals
This project requires the following approvals.
Name
Robert Coyle
Signature
Title
Project
Manager
Date of issue
30/03/14
Version
1
Distribution
Name
Oisin Creaner
Title
Project Lecturer
Date of issue
30/03/14
Version
1
79
Purpose of Document
Is to provide Oisin Creaner the project lecturer with a summary of the status of
the project.
Date of report
30/03/14
Period covered
10/03/14 30/03/14
Schedule Status
This project is still on schedule at this interval.
Updated Gantt chart
03-Feb
Project Proposal
Create Python codes
Data retrival from Twitter API and
Data retrival from Twitter API and
Management Progress Report 1
23-Feb
15-Mar
04-Apr
4
5
1
5
24-Apr
1
3
14-May
14
7
Definition
Application programming interface
JavaScript Object Notation
American Stock Exchange
Rich Site Summary
80
Problems
Actual
Accessing Twitter API
Potential
The quality and quantity of
the Twitter data provide
by Twilert.
81
Raid Log:
Risks
Open
Risks
Ris
k
Ref
R01
Risk
Categ
ory
technol
ogy
Date last
reviewed
30/03
/2014
Risk
Description
Raised
by
No data
backup
available
R.Coyle
10Feb14
R.Coyle
10Feb14
10Feb14
cost
Acquiring data
for free.
R03
time
Acquiring data
on time.
R.Coyle
Ris
k
Ref
Risk
Categ
ory
Risk
Description
Raised
by
R02
Closed
Risks
R01
R02
R03
technol
ogy
No data
backup
available
cost
No costs
needed for
use of data
time
Data will be
aquired on
time.
Dat
e
Iden
tifie
d
Dat
e
Iden
tifie
d
R.Coyle
17Feb14
R.Coyle
24Mar14
R.Coyle
24Mar14
Pri
orit
y
Im
pac
t
Pr
o
b
preve
ntion
accep
tance
preve
ntion
Pri
orit
y
Im
pac
t
Pr
o
b
Mitig
ation
Cate
gory
Mitig
ation
Cate
gory
preve
ntion
accep
tance
conti
ngenc
y
Mitig
ation
Sourc
e
onlin
e
stora
ge for
data.
Sourc
e free
histor
ic
twitte
r
feeds.
Sours
e the
data
on
time.
Mitig
ation
Sourc
e
hard
drive
for
stora
ge
Using
differ
ent
data.
Sours
e the
data
on
time.
O
wn
er
Up
dat
e
Dat
e
upd
ated
RC
10Feb14
RC
10Feb14
RC
10Feb14
O
wn
er
Up
dat
e
Dat
e
upd
ated
RC
10Jun14
RC
24Mar14
RC
24Mar14
E
nd
D
at
e
E
nd
D
at
e
82
Assumptions
Assumptions The purpose of this document is to surface, document, analyse and monitor the key assumptions
upon which the plan is based. Planning parameters, design parameters, issues and risks will be generated from these assumptions
Ref #
Assumption
Lecturers will provide
prompt feedback and
guidance
Twitter will repley to my
grant request for the use
of their historic data.
RSS feeds gathered from
twitter not missing data.
Skills developed for
analysis of data.
A01
A02
A03
A04
Test
Date
Importance
Certainty
Influence
Test
4 - critical
3 - Probable
2 - somewhat
1 - unknown
3 - important
4 - Fact
4 - critical
4 - Fact
Unknow as of yet.
Continue arriving to
lectures.
10-Feb14
03-Mar14
30-Mar14
03-Mar14
Issues
Issue
Description
Raised
by
Date
Raised
Impact
Priority
I01
Unexpected
issue in
accessing
twitter feeds.
RC
17-Feb14
I02
Twitter API
access more
complex than
anticipated.
RC
03Mar-14
I03
No response
from Twitter
developer
data grant
scheme.
RC
24Mar-14
Action
Plan
Identify
different
means of
accessing
the twitter
feeds.
This issue
has being
brought up
to Project
Leturers.
Awaiting
response.
This issue
has being
brought up
to Project
Leturers.
Alternative
solution
has being
provided.
Target
Resolution
Date
Actual
Resolution
Date
Status
Owner
open
RC
10-Feb-14
closed
RC
03-Mar-14
24-Mar-14
closed
RC
24-Mar-14
30-Mar-14
83
Dependency
Depen
dency
Dependency
Ref
Projec
t
Rai
sed
by
Dependency
Description
D01
NCI
Facilities
IT facilities available
for running twitter
API
D02
External
Expert
Twitter historical
data grant approval.
D03
External
Expert
Date
Rais
ed
Im
pac
t
Pri
orit
y
Peri
od
Affe
cted
RC
10Feb14
Feb Mar
RC
03Mar14
MarApr
RC
30Mar14
MarApr
Acti
on
Plan
Conf
irm
availa
bility
with
IT
Awai
ting
resp
onse
from
twitt
er
for
histo
rical
data
grant
appr
oval.
Awai
ting
resp
onse
from
exter
nal
client
.
Targ
et
Resol
ution
Date
Actu
al
Resol
ution
Date
RC
Mar14
Mar14
RC
Mar14
Mar14
RC
Apr14
Ow
ner
84
Conclusion
This project is still on course for completion within the requested timeline.
The project data source has changed since there has being no replay from the
Twitter research data grant scheme to access their historical data.
Twilert will now provide the data for the project.
It has proven to be a reliable source but can only provide access to 100 RSS feeds
per day, this data however will be duplicated providing enough data to complete
the project.
Yahoo finance will provide the historical stock market prices.
Alternatives:
If the Twitter developer grant is approved within the next 2 weeks the
project can revert back to using the correct historical data.
Revision History
Date of this revision: 20/04/14
Revision
date
Prevision
revision
date
20/04/14
Summary of changes
Changes
marked
First Issue
Approvals
This project requires the following approvals.
Name
Robert Coyle
Signature
Title
Project Manager
Date of issue
20/04/14
Version
1
Distribution
Name
Oisin Creaner
Title
Project Lecturer
Date of issue
20/04/14
Version
1
85
Purpose of Document
The purpose of this document is to provide the project lecturer, Oisin Creaner,
with a summary of the status of the project.
Date of report
20/04/14
Period covered
1/04/14 20/04/14
Schedule Status
This project is still on schedule at this interval.
Updated Gantt chart
03-Feb
Project Proposal
Create Python codes
23-Feb
15-Mar
04-Apr
4
5
1
5
24-Apr
7
7
14-May
25
7
Definition
Application programming interface
JavaScript Object Notation
American Stock Exchange
Rich Site Summary
86
03-Jun
Problems
Actual
Analysis of Data
Potential
Cleaning Twitter Data
Raid Log:
Risks
Open Risks
20/04/2014
Risk Ref
Risk Category
Risk Description
Raised by
R01
technology
R.Coyle
10-Feb-14
R02
cost
R.Coyle
10-Feb-14
R03
time
R.Coyle
R04
time
Data analysis.
R.Coyle
Mitigation
Owner
Update
Mitigation
Category
Impact
Prob
prevention
acceptance
10-Feb-14
prevention
20-Apr-14
prevention
Date updated
End Date
RC
10-Feb-14
RC
10-Feb-14
RC
10-Feb-14
RC
21-Apr-14
87
Closed Risks
Risk Ref
Risk Category
Risk Description
Raised by
R01
technology
R.Coyle
17-Feb-14
R02
cost
R.Coyle
24-Mar-14
R03
time
Data is acquired.
R.Coyle
24-Mar-14
Mitigation
Category
Mitigation
Owner
Update
Impact
Prob
Date updated
prevention
RC
10-Jun-14
acceptance
RC
24-Mar-14
contingency
RC
20-Apr-14
End Date
20-Apr-14
Assumptions
Assumptions The purpose of this document is to surface, document, analyze and monitor the key
assumptions upon which the plan is based. Planning parameters, design parameters, issues and risks will be generated from
these assumptions
Ref #
A01
A04
A05
A05
Assumption
Importance
Lecturers will
provide prompt
feedback and
guidance
Skills developed
for analysis of
data.
Data can be
cleaned and
prepared for
analysis.
Cleaned data is
adequate and can
be analyzed
Certainty
Influence
Test
Test Date
3 - important
3 - Probable
4 - critical
4 - Fact
Continue arriving to
lectures.
4 - critical
4 - Fact
4 - critical
4 - Fact
10-Feb-14
03-Mar-14
20-Apr-14
20-Apr-14
Issues
Issue Ref
Issue Description
Raised by
Date Raised
Impact
Priority
I01
RC
17-Feb-14
RC
03-Mar-14
RC
24-Mar-14
I02
I03
Target
Resolution
Date
Actual
Resolution
Date
Action Plan
Status
Owner
closed
RC
10-Feb-14
20-Apr-14
closed
RC
03-Mar-14
24-Mar-14
closed
RC
24-Mar-14
20-Apr-14
88
Dependency
Depend
ency Ref
D01
Project
NCI
Facilities
Depend
ency
Descript
ion
IT
facilities
available
for
running
twitter
API
Rais
ed
by
RC
Date
Raise
d
10Feb-14
Imp
act
Prior
ity
Perio
d
Affec
ted
Feb Mar
Actio
n
Plan
Confir
m
availabi
lity
with
IT
Own
er
Target
Resolut
ion
Date
Actual
Resolut
ion
Date
RC
Mar-14
Mar-14
Conclusion
This project is still on course for completion within the requested timeline.
The project data source has changed since the Twitter Historical Data grant was
denied. I now have gathered a weeks worth of Twitter data associated to three
companies that are on the same stock exchange.
I will now focus on Apple Inc., Tesla Motors, Inc. and Microsoft Corporation.
These tech companies being on the same stock exchange (NASDAQ) will create a
more straightforward approach to the analysis. Samsung Electronics, which was
my original company I had selected to base the analysis upon, is on the Korean
stock market. Not only would I have different time series but I would also have to
modify the currency difference.
Yahoo finance will provide the historical stock market prices.
I am hoping to find a correlation between the twitter activity and the stock
market prices of the three brands with a lag of around three to four days.
Alternatives:
If I can gather the stock market prices in hourly format the analysis would
be more detailed.
89
References
Usefulsocialmedia.com. 2014. Twitter Evolves Becoming more brand friendly |
Useful
Social
Media.
[online]
Available
at:
http://www.usefulsocialmedia.com/measurement/Twitter-evolves--becomingmore-brand-friendly [Accessed: 9 Feb 2014].
Johnson, L. 2014. Samsung Galaxy S5 release date, news, rumours, specs and price News
Trusted
Reviews.
[online]
Available
at:
http://www.trustedreviews.com/news/Samsung-galaxy-s5-release-date-newsrumours-specs-and-price [Accessed: 9 Feb 2014].
Macrumors.com. 2014. Apple Continues to Lose Smartphone Share, Gain Mobile
Phone
Share
in
4Q
2013.
[online]
Available
at:
http://www.macrumors.com/2014/01/28/apple-phone-share-4q-2013/
[Accessed: 9 Feb 2014].
Wellclever.com. 2014. Well Clever - Publisher Centric Platforms. [online] Available
at: http://wellclever.com [Accessed: 9 Feb 2014].
usefulsocialmedia. 2014. Twitter Evolves -Becoming more brand friendly.
[ONLINE]
Available
at:
http://www.usefulsocialmedia.com/measurement/Twitter-evolves--becomingmore-brand-friendly. [Accessed 23 February 14].
btimes.co.uk. 2013. Samsung's $14bn is 'Biggest Marketing Budget in History.
[ONLINE] Available at: http://www.ibtimes.co.uk/samsung-14bn-marketingbudget-biggest-history-525979. [Accessed 28 February 14].
comprendia. 2014. If A Tweet Falls In The Forest? Maximizing Twitter
Engagement Through Time Of Day Analysis. [ONLINE] Available at:
http://comprendia.com/2012/07/17/if-a-tweet-falls-in-the-forest-maximizingtwitter-engagement-and-exposure-through-time-of-day-analysis/. [Accessed 24
February 14].
powerpivotblog. 2013. Analyze a Twitter feed with Excel 2013, DataExplorer and
GeoFlow. [ONLINE] Available at: http://www.powerpivotblog.nl/analyze-atwitter-feed-with-excel-2013-dataexplorer-and-geoflow/. [Accessed 24
February 14].
evolutionanalytics. 2013. What does Barack Obama tweet about most?. [ONLINE]
Available at: http://blog.revolutionanalytics.com/2013/11/what-does-barackobama-tweet-about-most.html. [Accessed 24 February 14].
skilledup. 2013. 50+ (Mostly) Free Excel Add-Ins For Any Task. [ONLINE]
Available at: http://www.skilledup.com/learn/businessentrepreneurship/mostly-free-excel-add-ins/. [Accessed 24 February 14].
The Use of Twitter Activity as a Stock Market Predictor
90
91