You are on page 1of 11

c 


       
m
m  
  

 

  
 
  


   
    


 
   


 

  
   
  

 
      
 



  

    
         


     
 m
   !

 
  


 "
 
    

 # 
$
mm

  %   m         &
 
 
  $ 


 & 
      
   
        
 

 
  
 
   #%m  

   
 
'
 $  
  
"
  
 
  

  %


   

  (   
     
(    
  
 
  
 m 
  
   
 
    $
mmm
m  )   
 *+ %  %) )%   &% 
  %
  
,% -         

 

%
)  , %)-       ( &
   


    

    %) 

    ) 
%
  
,)% -          ,)( 
&m(  -  

    m' %)    

 , & m (&m' (&'-
     
 

    &  % 


,&% -       


 

   
. 
          
"
 

 $
m/
m  m
 %

 +  &
 
  #  
+
 
       #       

  #        $
/
!

  !  !

    , $   
 - m 

  

 , )!' !m0  &m!- 



 

 1  

    , 





    
 - ! 

       


#

 0
 

    
  '   
 


  '     "

 
    !

  
  $
È 
V    

As more companies begin measuring their data stores in petabytes, the realities of data
proliferation become very apparent. In fact, the proliferation of the data proliferation topic
itself is out of control. It¶s ironic that Microsoft Word doesn¶t recognize petabyte as a real
word, yet the corporation¶s data stores have reached petabyte levels. Information is
exploding, and it has become virtually impossible for companies to keep up.p
p
Consider Wal-Mart. The chain has more than 6,000 stores, and some have almost a half-
million SKUs each. You think your Excel spreadsheets from finance are bad? Wal-Mart¶s
database tables have literally 100 billion rows. The retailer¶s POS systems have to ring up
some 276 million items ± in one day.p


‘ 
 
 
p
pp
Companies managing massive volumes of information are faced with significant challenges
today, including some intense acquisition issues in their business intelligence (BI) supply
chain. Acquiring new data and data sources is paramount to enriching existing data
products and creating new products. Information companies may regularly acquire data
from literally thousands of sources in hundreds of different formats at varying frequencies.
And the tolerance for latency of new data becomes less and less by the day.p
p
According to Gavin Whatrup, group IT director at marketing services company Creston,
companies today are being buried beneath an avalanche of data. Data capture, such as
RFID, is generating huge amounts of additional data. Regulations dictate that we retain it,
and competitive pressures demand we make use of it. Data management, and that includes
the storage technology behind it, and knowledge management are going to be key
technologies in the battle to remain operationally compliant and commercially competitive.3p
p
Hubert M. Yoshida, vice president and chief technology officer of Hitachi Data Systems,
predicts an even more dizzying future. Yoshida said, ³Currently, companies basically use
petabytes of data, but in the next few years, their data will increase to exabytes.´4p
p
Planning for this information onslaught is a formidable challenge. Aside from the issues
associated with legacy systems in place, it¶s even more challenging to plan for 2x, 4x and 8x
scenarios of existing terabyte data stores. Companies are beginning to approach this
challenge, but data proliferation often ends up being the material that keeps CIOs and CEOs
awake at night.p
p
This article will discuss system recommendations in future planning for a robust, cost-
effective and scalable system, including inbound, processing and outbound BI concerns.p


V   
A data source is any of the following types of sources for (mostly) digitized data:

j a database
j
j a computer file
j a data stream
Data from such sources is usually formatted and contains a certain amount of metadata.

Source data is the origin of information found in electronic media.

Often when data is captured in one electronic system and then transferred to another, there
is a loss of audit trail or the inherent data cannot be absolutely verified. There are systems
that provide for absolute data export but then the system imported into has to allow for all
available data fields to be imported. Similarly, there are transaction logs in many modern
database systems. The acceptance of these transaction records into any new system could
be very important for any verification of such imported data.

The People's Map data creation process can sometimes be confusing to our new users, so
here follows a brief explanation of how the data creation process works and what data will
appear in what layer, at what time of the process.

 V   



Step 1: Identify area with no data
Search for the area that you want to create mapping for.

Step 2: Create data


Digitise data by using the editing tools. This data is to be immediately processed in
recievable form.
Step 3: Unverified layer
You will notice that the mapping you created will appear in the unverified map layer with a
distinction. This indicates that the new or edited mapping has been marked as unverified.

Step 4: Verified layer


This unverified mapping layer will be exported and validated by a verification process. If
the mapping passes this process, the distinction will be removed and the feature will appear
in the database(Verified). The verification process is undertaken on a weekly basis.

V  
 
A data center is a facility used to house computer systems and associated components,
such as telecommunications and storage systems. It generally includes redundant or backup
power supplies, redundant data communications connections, environmental controls (e.g.,
air conditioning, fire suppression) and security devices.

 
  
  
 
Racks of telecommunications equipment in part of a data center.

IT operations are a crucial aspect of most organizational operations. One of the main
concerns is business continuity; companies rely on their information systems to run their
operations. If a system becomes unavailable, company operations may be impaired or
stopped completely. It is necessary to provide a reliable infrastructure for IT operations, in
order to minimize any chance of disruption. Information security is also a concern, and for
this reason a data center has to offer a secure environment which minimizes the chances of
a security breach. A data center must therefore keep high standards for assuring the
integrity and functionality of its hosted computer environment. This is accomplished through
redundancy of both fiber optic cables and power, which includes emergency backup power
generation.
Telcordia GR-3160, NEBS Requirements for Telecommunications Data Center Equipment
and Spaces, provides guidelines for data center spaces within telecommunications networks,
and environmental requirements for the equipment intended for installation in those spaces.
These criteria were developed jointly by Telcordia and industry representatives. They may
be applied to data center spaces housing data processing or Information Technology (IT)
equipment. The equipment may be used to:

Operate and manage a carrier¶s telecommunication network


Provide data center based applications directly to the carrier¶s customers
Provide hosted applications for a third party to provide services to their customers
Provide a combination of these and similar data center applications.

Effective data center operation requires a balanced investment in both the facility and the
housed equipment. The first step is to establish a baseline facility environment suitable for
equipment installation. Standardization and modularity can yield savings and efficiencies in
the design and construction of telecommunications data centers.

Standardization means integrated building and equipment engineering. Modularity has the
benefits of scalability and easier growth, even when planning forecasts are less than
optimal. For these reasons, telecommunications data centers should be planned in repetitive
building blocks of equipment, and associated power and support (conditioning) equipment
when practical. The use of dedicated centralized systems requires more accurate forecasts
of future needs to prevent expensive over construction, or perhaps worse ² under
construction that fails to meet future needs.

The "lights-out" data center, also known as a darkened or a dark data center, is a data
center that, ideally, has all but eliminated the need for direct access by personnel, except
under extraordinary circumstances. Because of the lack of need for staff to enter the data
center, it can be operated without lighting. All of the devices are accessed and managed by
remote systems, with automation programs used to perform unattended operations. In
addition to the energy savings, reduction in staffing costs and the ability to locate the site
further from population centers, implementing a lights-out data center reduces the threat of
malicious attacks upon the infrastructure.

  



The phrase storage management is a general storage industry term used to describe the
tools, processes, and policies used to manage storage networks and storage services such
as virtualization, replication, mirroring, security, compression, traffic analysis, and other
services. The phrase also encompasses other storage technologies, such as process
automation, storage management and real-time infrastructure products, and storage
provisioning.

Computer data storage, often called storage or memory, refers to computer components
and recording media that retain digital data used for computing for some interval of time.
Computer data storage provides one of the core functions of the modern computer, that of
information retention.
Primary storage (or main memory or internal memory), often referred to simply as memory,
is the only one directly accessible to the CPU. The CPU continuously reads instructions
stored there and executes them as required. Any data actively operated on is also stored
there in uniform manner.

Secondary storage (also known as external memory or auxiliary storage), differs from
primary storage in that it is not directly accessible by the CPU. The computer usually uses
its input/output channels to access secondary storage and transfers the desired data using
intermediate area in primary storage. Secondary storage does not lose the data when the
device is powered down²it is non-volatile. Per unit, it is typically also two orders of
magnitude less expensive than primary storage.

Tertiary storage or tertiary memory, provides a third level of storage. Typically it involves a
robotic mechanism which will mount (insert) and dismount removable mass storage media
into a storage device according to the system's demands; this data is often copied to
secondary storage before use. It is primarily used for archiving rarely accessed information
since it is much slower than secondary storage (e.g. 5±60 seconds vs. 1-10 milliseconds).
This is primarily useful for extraordinarily large data stores, accessed without human
operators. Typical examples include tape libraries and optical jukeboxes.
Off-line storage is a computer data storage on a medium or a device that is not under the
control of a processing unit. The medium is recorded, usually in a secondary or tertiary
storage device, and then physically removed or disconnected. It must be inserted or
connected by a human operator before a computer can access it again. Unlike tertiary
storage, it cannot be accessed without human interaction.

     
 

Non-volatile memory
Will retain the stored information even if it is not constantly supplied with electric power.
It is suitable for long-term storage of information.
Volatile memory
Requires constant power to maintain the stored information. The fastest memory
technologies of today are volatile ones (not a universal rule). Since primary storage is
required to be very fast, it predominantly uses volatile memory.

V 
 


Dynamic random access memory


A form of volatile memory which also requires the stored information to be periodically
re-read and re-written, or refreshed, otherwise it would vanish.
Static memory
A form of volatile memory similar to DRAM with the exception that it never needs to be
refreshed as long as power is applied. (It loses its content if power is removed).

 

Read/write storage or mutable storage


Allows information to be overwritten at any time. A computer without some amount of
read/write storage for primary storage purposes would be useless for many tasks. Modern
computers typically use read/write storage also for secondary storage.
Read only storage
Retains the information stored at the time of manufacture, and write once storage (Write
Once Read Many) allows the information to be written only once at some point after
manufacture. These are called immutable storage. Immutable storage is used for tertiary
and off-line storage. Examples include CD-ROM and CD-R.
Slow write, fast read storage
Read/write storage which allows information to be overwritten multiple times, but with
the write operation being much slower than the read operation. Examples include CD-RW
and flash memory.

 

Random access
Any location in storage can be accessed at any moment in approximately the same
amount of time. Such characteristic is well suited for primary and secondary storage.
Sequential access
The accessing of pieces of information will be in a serial order, one after the other;
therefore the time to access a particular piece of information depends upon which piece of
information was last accessed. Such characteristic is typical of off-line storage.
  

Location-addressable
Each individually accessible unit of information in storage is selected with its numerical
memory address. In modern computers, location-addressable storage usually limits to
primary storage, accessed internally by computer programs, since location-addressability is
very efficient, but burdensome for humans.
File addressable
Information is divided into files of variable length, and a particular file is selected with
human-readable directory and file names. The underlying device is still location-
addressable, but the operating system of a computer provides the file system abstraction to
make the operation more understandable. In modern computers, secondary, tertiary and
off-line storage use file systems.
Content-addressable
Each individually accessible unit of information is selected based on the basis of (part of)
the contents stored there. Content-addressable storage can be implemented using software
(computer program) or hardware (computer device), with hardware being faster but more
expensive option. Hardware content addressable memory is often used in a computer's CPU
cache.

  

Raw capacity
The total amount of stored information that a storage device or medium can hold. It is
expressed as a quantity of bits or bytes (e.g. 10.4 megabytes).
Memory storage density
The compactness of stored information. It is the storage capacity of a medium divided
with a unit of length, area or volume (e.g. 1.2 megabytes per square inch).

 
 

Latency
The time it takes to access a particular location in storage. The relevant unit of
measurement is typically nanosecond for primary storage, millisecond for secondary
storage, and second for tertiary storage. It may make sense to separate read latency and
write latency, and in case of sequential access storage, minimum, maximum and average
latency.
Throughput
The rate at which information can be read from or written to the storage. In computer
data storage, throughput is usually expressed in terms of megabytes per second or MB/s,
though bit rate may also be used. As with latency, read rate and write rate may need to be
differentiated. Also accessing media sequentially, as opposed to randomly, typically yields
maximum throughput.


 

Storage devices that reduce fan usage, automatically shut-down during inactivity, and
low power hard drives can reduce energy consumption 90 percent.
2.5 inch hard disk drives often consume less power than larger ones. Low capacity solid-
state drives have no moving parts and consume less power than hard disks. Also, memory
may use more power than hard disks.
>  


Operating Systems
An operating system (OS) is software, consisting of programs and data, that runs on
computers, manages computer hardware resources, and provides common services for
execution of various application software.
Operating system development is one of the most complicated activities in which a
computing hobbyist may engage. A hobby operating system may be classified as one whose
code has not been directly derived from an existing operating system, and has few users
and active developers.

Applications
Application software, also known as an application or an "app", is computer software
designed to help the user to perform singular or multiple related specific tasks. Examples
include enterprise software, accounting software, office suites, graphics software and media
players. Many application programs deal principally with documents.
Application software applies the power of a particular computing platform or system
software to a particular purpose. Some apps such as Microsoft Office are available in
versions for several different platforms; others have narrower requirements.

Database
Database research has been carried out since the early days of dealing with the database
concept in the 1960s. It has taken place at research and development groups of companies
(e.g., notably at IBM Research), research institutes, and Academia. Research has been done
both through Theory and Prototypes. The interaction between research and database
related product development has been very productive to the database area, and many
related key concepts and technologies emerged from it. Notable are the Relational and the
Entity-relationship models, the Atomic transaction concept and related Concurrency control
techniques, Query optimization methods, etc.

Networks
Networks are often classified as local area network (LAN), wide area network (WAN),
metropolitan area network (MAN), personal area network (PAN), virtual private network
(VPN), campus area network (CAN), storage area network (SAN), and others, depending on
their scale, scope and purpose, e.g., controller area network (CAN) usage, trust level, and
access right often differ between these types of networks. LANs tend to be designed for
internal use by an organization's internal systems and employees in individual physical
locations, such as a building, while WANs may connect physically separate parts of an
organization and may include connections to third parties.


 
  



Information Lifecycle Management refers to a wide-ranging set of strategies for


administering storage systems on computing devices. Specifically, four categories of storage
strategies may be considered under the auspices of ILM.


ILM Policy consists of the overarching storage and information policies that drive
management processes. Policies are dictated by business goals and drivers. Therefore,
policies generally tie into a framework of overall IT governance and management; change
control processes; requirements for system availability and recovery times; and service
level agreements (SLAs).


  


Operational aspects of ILM include backup and data protection; disaster recovery, restore,
and restart; archiving and long-term retention; data replication; and day-to-day processes
and procedures necessary to manage a storage architecture.

 

Infrastructure facets of ILM include the logical and physical architectures; the applications
dependent upon the storage platforms; security of storage; and data center constraints.
Within the application realm, the relationship between applications and the production, test,
and development requirements are generally most relevant for ILM.
V 



Information Lifecycle Management (sometimes abbreviated ILM) is the practice of applying


certain policies to the effective management of information throughout its useful life. This
practice has been used by Records and Information Management (RIM) Professionals for
over three decades and had its basis in the management of information in paper or other
physical forms (microfilm, negatives, photographs, audio or video recordings and other
assets). Video Lifecycle Management (VLM) is a video aware subset of ILM.

ILM includes every phase of a "record" from its beginning to its end. And while it is
generally applied to information that rises to the classic definition of a record (Records
management), it applies to any and all informational assets. During its existence,
information can become a record by being identified as documenting a business transaction
or as satisfying a business need. In this sense ILM has been part of the overall approach of
ECM Enterprise content management.

However, in a more general perspective the term "business" must be taken in a broad
sense, and not forcibly tied to direct commercial or enterprise contexts. While most records
are thought of as having a relationship to enterprise business, not all do. Much recorded
information serves to document an event or a critical point in history. Examples of these are
birth, death, medical/health and educational records. e-Science, for example, is an
emerging area where ILM has become relevant.

In the year 2004, attempts have been made by the Information Technology and Information
Storage industries (SNIA association) to assign a new broader definition to Information
Lifecycle Management (ILM) according to this broad view:

Information Lifecycle Management comprises the policies, processes, practices, and tools
used to align the business value of information with the most appropriate and cost effective
IT infrastructure from the time information is conceived through its final disposition.
Information is aligned with business processes through management policies and service
levels associated with applications, metadata, information, and data.




For the purposes of business records, there are five phases identified as being part of the
lifecycle continuum along with one exception. These are:

Creation and Receipt


Distribution
Use
Maintenance
Disposition
Creation and Receipt deals with records from their point of origination. This could include
their creation by a member of an organization at varying levels or receipt of information
from an external source. It includes correspondence, forms, reports, drawings, computer
input/output, or other sources.

Distribution is the process of managing the information once it has been created or
received. This includes both internal and external distribution, as information that leaves an
organization becomes a record of a transaction with others.

Use takes place after information is distributed internally, and can generate business
decisions, document further actions, or serve other purposes.

Maintenance is the management of information. This can include processes such as filing,
retrieval and transfers. While the connotation of 'filing' presumes the placing of information
in a prescribed container and leaving it there, there is much more involved. Filing is actually
the process of arranging information in a predetermined sequence and creating a system to
manage it for its useful existence within an organization. Failure to establish a sound
method for filing information makes its retrieval and use nearly impossible. Transferring
information refers to the process of responding to requests, retrieval from files and
providing access to users authorized by the organization to have access to the information.
While removed from the files, the information is tracked by the use of various processes to
ensure it is returned and/or available to others who may need access to it.

Disposition is the practice of handling information that is less frequently accessed or has
met its assigned retention periods. Less frequently accessed records may be considered for
relocation to an 'inactive records facility' until they have met their assigned retention period.
"Although a small percentage of organizational information never loses its value, the value
of most information tends to decline over time until it has no further value to anyone for
any purpose. The value of nearly all business information is greatest soon after it is created
and generally remains active for only a short time --one to three years or so-- after which
its importance and usage declines. The record then makes its life cycle transition to a semi-
active and finally to an inactive state." [1] Retention periods are based on the creation of an
organization-specific retention schedule, based on research of the regulatory, statutory and
legal requirements for management of information for the industry in which the organization
operates. Additional items to consider when establishing a retention period are any business
needs that may exceed those requirements and consideration of the potential historic,
intrinsic or enduring value of the information. If the information has met all of these needs
and is no longer considered to be valuable, it should be disposed of by means appropriate
for the content. This may include ensuring that others cannot obtain access to outdated or
obsolete information as well as measures for protection privacy and confidentiality.'
remains accessible. Media is subject to both degradation and obsolescence over its lifespan,
and therefore, policies and procedures must be established for the periodic conversion and
migration of information stored electronically to ensure it remains accessible for its required
retention periods.