Professional Documents
Culture Documents
Big Data:
What is it and how
much data is there
2
> small
data
2017
s
e
t
y
b
0
1
=
e
ytes
b
t
0
y
0
0
,
b
a
t
0,000
t
0
0
e
,
0
z
0
1 1,000,000,000,0
2017
1.4
2017
2012
39
2012
2012
2.3
2017
2012
19
0.5
12
11.3
3.6
Global
users
(billions)
Global
networked
devices
(billions)
Global broadband
speed
(Mbps)
Global traffic
(zettabytes)
http://www.cisco.com/en/US/netsol/ns827/networking_solutions_sub_solution.html#~foreca
st
> tools
Divide et
impera*
* Divide and
conquer
10
An example
How much pages are written in latin among the books
in the Ancient Library of Alexandria?
GREEK
REF7
P20
LATIN
REF4
P73
LATIN
REF1
P45
LATIN
pages 45
LATIN
REF5
P34
GREEK
REF2
P128
still
reading
45 (ref 1)
Reducer
GREEK
REF8
P230
EGYPT
REF6
P10
EGYPT
REF3
P12
EGYPTIAN
Mapper
s
11
An example
How much pages are written in latin among the books
in the Ancient Library of Alexandria?
GREEK
REF7
P20
LATIN
REF4
P73
LATIN
REF5
P34
GREEK
REF2
P128
still
reading
45 (ref 1)
GREEK
Reducer
GREEK
REF8
P230
EGYPT
REF6
P10
EGYPTIAN
Mapper
s
12
An example
How much pages are written in latin among the books
in the Ancient Library of Alexandria?
GREEK
REF7
P20
LATIN
REF4
P73
LATIN
pages 73
LATIN
REF5
P34
LATIN
pages 34
45 (ref 1)
+73 (ref 4)
+34 (ref 5)
Reducer
GREEK
REF8
P230
EGYPTIAN
Mapper
s
13
An example
How much pages are written in latin among the books
in the Ancient Library of Alexandria?
GREEK
GREEK
REF7
P20
45 (ref 1)
+73 (ref 4)
+34 (ref 5)
idle
Reducer
GREEK
REF8
P230
GREEK
Mapper
s
14
An example
How much pages are written in latin among the books
in the Ancient Library of Alexandria?
idle
45 (ref 1)
+73 (ref 4)
+34 (ref 5)
idle
Reducer
idle
Mapper
s
15
152 TOTAL
Hadoop architecture
head node
16
FI-WARE proposal:
Cosmos Big Data
17
What is Cosmos?
Cosmos is Telefnica's Big Data platform
Dynamic creation of private computing clusters
as a service
Infinity, a cluster for persistent storage
Cosmos architecture
19
Remotely
What
Clusters operation
Cosmos CLI
REST API
I/O operation
hadoop fs
command
REST API
(WebHDFS, HttpFS,
Infinity protocol)
Hive CLI
JDBC, Thrift*
hadoop jar
command
Querying tools
(basic analysis)
MapReduce
(advanced analysis)
20
Clusters operation:
Getting your own
roman legion
21
22
23
24
Creating a cluster
$ cosmos create --name <STRING> --size <INT>
Terminating a cluster
List a directory
GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS
PUT http://<HOST>:<PORT>/<PATH>?op=MKDIRS[&permission=<OCTAL>]
DELETE http://<host>:<port>/webhdfs/v1/<path>?op=DELETE
[&recursive=<true|false>]
PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?
op=RENAME&destination=<PATH>
Concat files
POST http://<HOST>:<PORT>/webhdfs/v1/<PATH>?
op=CONCAT&sources=<PATHS>
Set permission
PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETPERMISSION
[&permission=<OCTAL>]
Set owner
PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETOWNER
[&owner=<USER>][&group=<GROUP>]
28
POST
http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=APPEND[&buffersize=<INT>]
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND...
Content-Length: 0
POST -T <LOCAL_FILE>
http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND...
29
http://
hadoop.apache.org/docs/current/hadoop-project-dist
/hadoop-hdfs/WebHDFS.html
32
33
A connection to the Hive
server (TCP/10000)
https://
github.com/telefonicaid/fiware-connectors/tree/develop/resources/hive-basic-cl
34
ient
e/develop/resources/hive-basic-client
35
https://github.com/telefonicaid/fiware-connectors/tree/develop/resources/plague-tracker
36
5. MapReduce applications
MapReduce applications are commonly written in
Java
39
org.apache.hadoop.fs.Path;
org.apache.hadoop.conf.*;
org.apache.hadoop.io.*;
org.apache.hadoop.mapred.*;
org.apache.hadoop.util.*;
40
Java map-reduce
Pig and Hive
Sqoop
System specific jobs (such as Java programs and shell scripts)
42
Useful references
Hive resources:
http://
hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Comm
andsManual.html
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Web
HDFS.html
http://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html
43
RULES
DEFINITION
OPERATIONAL
DASHBOARD
REAL TIME
PRCSSING
DATA
QUERYING
SUBS
CEP
GIS
BI
ETL
SHORT TERM
HISTORIC
DATA
PROCESSING
OPEN DATA
OPEN DATA
COSMOS
(BIG DATA)
Service
Orchrestation
CONTEXT BROKER
Context
Adapters
CKAN
SENSOR 2 THINGS
You d
o
to us nt have
e t he
m all
!
IoT Backend
Device Management
measures / commands
T-T
PORTALS
IoT/Sensor
Open Data
45
City
Services
KPI GOVERNANCE
BLNK
https://
https://
github.com/telefonicaid/fiware-connectors/tree/develop/flu
forge.fi-ware.eu/plugins/mediawiki/wiki/fiware/index.php/How_to_persist_Orion_data_in_Cosmos
me
46
47
Roadmap:
More functionalities
and integrations
49
Roadmap
Integrate the clusters creation with the
cloud portal
No more REST API work
Streaming analysis capabilities
Not all the analysis can wait for a batch
processing
Geolocation analysis capabilities
An important source of data nowadays
Integrate with CKAN
As a source of batch data
Integrate with the Marketplace
Selling datasets
Selling analysis results
50
fiware-lab-help@lists.fi-ware.org
francisco.romerobueno@telefonic
a.com
51
Thanks !
http://fi-ppp
.eu
http://fi-war
e.eu
Follow
@Fiware on
Twitter!
52