Professional Documents
Culture Documents
AbstractAs data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is
to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is
also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the
challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially,
existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being
done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges
and advantages of using data warehouse before data mining task are also explained in detail.
__________________________________________________*****_________________________________________________
Any type of data warehouse should deal with increasing Text data
data rate. Storing capacity of data warehouse should be flexible For making the business wiser people are taking the help of
with real data size. It should support dynamic scaling. In the era reviews of users, advertisement, recommendation systems and
of cloud computing perfect solution for big data warehouse is lot more. Each of this methods are using textual data. To
dynamic scaling. We may choose horizontal scaling or vertical provide the platform for OLAP processing for textual data
scaling for our purpose. Variousplatforms are available for document warehousing is popularly used. Document
horizontal scaling like Hadoop and for vertical scaling like warehousing is the solution for storing multidimensional
GPU (Graphics Processing Unit). documents and to do analysis over it for proficient text mining.
3. Efficiency By using document warehousing approach various
heterogeneous document data can be integrated in well-formed
As far as efficiency of data warehouse is concern it is infrastructure. Challenges like scaling, performance and
related to construction of data warehouse as well as its security are also introduced in big data concern for document
operating efficiency. Big Data mining techniques either applied warehousing.[17]
via data warehouse or directly on data warehouse depending on
convenience. If data warehouse is able to respond faster for the Research point of view, document warehousing is the thirst
millions of queries then it is big efficiency concern [6] area to contribute in OLAPing. The paper emphases on giving
an improved solution of data warehousing in the big data era.
4. Heterogeneity Methodology mainly consists of three stages documentation,
aggregation and data loading stage. Documentation stage
Data coming from various heterogeneous sources results remove the data from data sources by including that data to
into variety of data, like structured, semi structured and simple text files. Aggregation phase uses MapReduce process
unstructured data set. Some sources follow RDBMS type and to finish ETL from various data files received from the first
some follows NoSQL databases. Every type of dataset must be stage. In this phase all the results generated will be transformed
provided a unique layer of data integration. Data warehouse into JSON objects. By using this approach parallelism can
should be flexible enough to deal with heterogeneous dataset so achieved better and big data problem can be solved [15]
that data warehouse wont suffer from the cost of
reconstruction [6].
207
IJRITCC | June 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 6 206 208
_______________________________________________________________________________________________
Spatial Data the 2012 Conference of the Center for Advanced Studies on
Collaborative Research, 2012, pp. 241242.
For the analytics of remote sensing data, spatial on line [5] A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan, Data
analytical processing (SOLAP) is used. SOLAP is a perfect Cube Materialization and Mining over MapReduce, IEEE
solution for decision support system for exploring Transactions on Knowledge and Data Engineering, vol. 24, no.
multidimensional perspective of spatial data. It can be used in 10, pp. 17471759, Oct. 2012.
[6] A. Cuzzocrea, L. Bellatreche, and I.-Y. Song, Data
spatio-temporal analytics for whether and environment
warehousing and OLAP over big data: current challenges and
monitoring systems. As the data generated from earth
future research directions, in Proceedings of the sixteenth
observation, it is very challenging to manage because of large international workshop on Data warehousing and OLAP, 2013,
scale and aggregation point of view. SOLAP cube uses the pp. 6770.
concept of map reduce in order to get higher parallelism. [7] S. Mansmann, N. Ur Rehman, A. Weiler, and M. H. Scholl,
Newer approach is implemented on Hadoop framework using Discovering OLAP dimensions in semi-structured data,
the traditional operations like roll-up/drill down and slice/dice Information Systems, vol. 44, pp. 120133, Aug. 2014.
on optimized ROLAP/MOLAP/HOLAP cube [1]. [8] J. Song, C. Guo, Z. Wang, Y. Zhang, G. Yu, and J.-M. Pierson,
HaoLap: A Hadoop based OLAP system for big data, Journal
Web data of Systems and Software, vol. 102, pp. 167181, Apr. 2015.
[9] D.-H. Shin and M. J. Choi, Ecological views of big data:
Extreme use of internet and web generates massive web Perspectives and issues, Telematics and Informatics, vol. 32,
dataset. By the concept of Web warehousing the critical aspect no. 2, pp. 311320, May 2015.
related to decision support system can be built. Advantages like [10] J. Dittrich and J.-A. Quian-Ruiz, Efficient big data processing
improved productivity and cost savings can be achieved by in Hadoop MapReduce, Proceedings of the VLDB Endowment,
applying web warehousing. Web warehousing is the approach vol. 5, no. 12, pp. 20142015, 2012.
[11] S. Lee, S. Jo, and J. Kim, MRDataCube: Data cube
to build the OLAP cube and warehouse on web information in
computation using MapReduce, in Big Data and Smart
the form of semi structured data, graphics, text, sound, images,
Computing (BigComp), 2015 International Conference on, 2015,
multimedia objects, videos and many more. In simple language pp. 95102.
we may say web warehousing is the combination of data [12] I. Triguero, D. Peralta, J. Bacardit, S. Garca, and F. Herrera,
warehouse and web technology. Research in this area is to MRPR: A MapReduce solution for prototype reduction in big
show how efficient web warehouse than the data warehouse by data classification, Neurocomputing, vol. 150, pp. 331345,
applying the web data on traditional warehouse. For the big Feb. 2015.
data concern again map reduce procedures are used to avail [13] T. Niemi, J. Nummenmaa, and P. Thanisch, Normalising
high parallelism. Using the Hadoop framework and HBase OLAP cubes for controlling sparsity, Data & Knowledge
Engineering, vol. 46, no. 3, pp. 317343, Sep. 2003.
gives improved results.[18]
[14] N. U. Rehman, A. Weiler, and M. H. Scholl, OLAPing social
References media: the case of Twitter, in Proceedings of the 2013
IEEE/ACM International Conference on Advances in Social
[1] J. Li, L. Meng, F. Z. Wang, W. Zhang, and Y. Cai, A Map- Networks Analysis and Mining, 2013, pp. 11391146.
Reduce-enabled SOLAP cube for large-scale remotely sensed [15] M. Ben Kraiem, J. Feki, K. Khrouf, F. Ravat, and O. Teste,
data aggregation, Computers & Geosciences, vol. 70, pp. 110 OLAP of the tweets: From modeling toward exploitation, in
119, Sep. 2014. Research Challenges in Information Science (RCIS), 2014 IEEE
[2] C. Blanco, I. Garca-Rodrguez de Guzmn, E. Fernndez- Eighth International Conference on, 2014, pp. 110.
Medina, and J. Trujillo, An architecture for automatically [16] C. X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao, Text Cube:
developing secure OLAP applications from models, Computing IR Measures for Multidimensional Text Database
Information and Software Technology, vol. 59, pp. 116, Mar. Analysis, 2008, pp. 905910.
2015. [17] F. S. C. Tseng and A. Y. H. Chou, The concept of document
[3] N. U. Rehman, S. Mansmann, A. Weiler, and M. H. Scholl, warehousing for multi-dimensional modeling of textual-based
Building a Data Warehouse for Twitter Stream Exploration, business intelligence, Decision Support Systems, vol. 42, no. 2,
2012, pp. 13411348. pp. 727744, Nov. 2006.
[4] L. Petrazickis, M. Butuc, and B. Steinfeld, Crunching big data [18] X. Tan, D. C. Yen, and X. Fang, Web warehousing: Web
with Hadoop and BigInsights in the cloud, in Proceedings of technology meets data warehousing, Technology in Society,
vol. 25, no. 1, pp. 131148, Jan. 2003.
208
IJRITCC | June 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________