You are on page 1of 77

Fundamentals of Relational

Database Design and


Database Planning
J.Trumbo
Fermilab
CSS-DSG
1

Outline

Definitions
Selecting a dbms
Selecting an application layer
Relational Design
Planning
A very few words about Replication
Space
2

Definitions
What is a database?
A database is the implementation of freeware or
commercial software that provides a means to
organize and retrieve data. The database is the
set of physical files in which all the objects and
database metadata are stored. These files can
usually be seen at the operating system level.
This talk will focus on the organize aspect of
data storage and retrieval.
Commercial vendors include MicroSoft and Oracle.
Freeware products include mysql and postgres.
For this discussion, all points/issues apply to both
commercial and freeware products.
3

Definitions
Instance
A database instance, or an instance is
made up of the background processes
needed by the database software.
These processes usually include a process
monitor, session monitor, lock monitor,
etc. They will vary from database vendor
to database vendor.
4

Definitions
What is a schema?

A SCHEMA IS NOT A DATABASE, AND A DATABASE IS NOT


A SCHEMA.
A database instance controls 0 or more databases.
A database contains 0 or more database application
schemas.
A database application schema is the set of database
objects that apply to a specific application. These objects
are relational in nature, and are related to each other,
within a database to serve a specific functionality. For
example payroll, purchasing, calibration, trigger, etc. A
database application schema not a database. Usually
several schemas coexist in a database.
A database application is the code base to manipulate
and retrieve the data stored in the database application
schema.
5

Definitions Cont.
Primary Definitions

Table, a set of columns that contain data. In


the old days, a table was called a file.
Row, a set of columns from a table reflecting a
record.
Index, an object that allows for fast retrieval of
table rows. Every primary key and foreign key
should have an index for retrieval speed.
Primary key, often designated pk, is 1 or more
columns in a table that makes a record unique.
6

Definitions Cont.
Primary Definitions

Foreign key, often designated fk, is a common


column common between 2 tables that define
the relationship between those 2 tables.
Foreign keys are either mandatory or optional.
Mandatory forces a child to have a parent by
creating a not null column at the child. Optional
allows a child to exist without a parent, allowing
a nullable column at the child table (not a
common circumstance).
7

Definitions Cont.
Primary Definitions
Entity Relationship Diagram or ER is a
pictorial representation of the application
schema.

Er Example
REVISION

MODULE

STATUS
# ST A T_ ID
o ST A TUS_ NA M E
* CREA TE _DA TE
* CREA TE _US ER
...

describes
has

# M ODULE _ID
* M ODUL E_ NA M E
* CREA TE _DA TE
* CREA TE _US ER
...

have

may have
associated with

part of

OWNER

PARAMETER
# PA R_ID
* P A R_ NA M E
* T E XT
* V A LUE
o UP P ER_L IM IT
o LOWE R_ LIM IT
* S RC
* DOCUM E NT A TION
o DRA WINGS
* CREA TE _DA TE
* CREA TE _US ER
...

creates

RE V _ID
RE V _NAM E
RE V_ DA TE
CREA TE _DA TE
CREA TE _US ER
UP DAT E_ DA T E
UP DAT E_ US E R

associated with

own
# OWNE R_ID
* FIRS T_ NA M E
* L AS T _NAM E
* P A SS WORD
* E M A IL
* US ERNA M E
* CREA TE _DA TE
* CREA TE _US ER
...

#
o
*
*
*
o
o

UNIT
have
describes

#
*
*
*
*
o

UNIT_ ID
UNIT _NAM E
CREA TE _DA TE
CREA TE _US ER
UP DA TE _DA TE
UP DAT E_ US E R

has
has

describes
HISTORY
#
*
*
*
*
o
o

HIS T_ ID
DA TE _CHA NGE D
RE AS ON
CREA TE _DA TE
CREA TE _US ER
UP DAT E_ DA T E
UP DAT E_ US E R

Definitions Cont.
Primary Definitions
Constraints are rules residing in the
databases data dictionary governing
relationships and dictating the ways
records are manipulated, what is a
legal move vs. what is an illegal
move. These are of the utmost
importance for a secure and
consistent set of data.
10

Definitions Cont.
Primary Definitions
Data Manipulation Language or DML,
sql statements that insert, update or
delete database in a database.
Data Definition Language or DDL, sql
used to create and modify database
objects used in an application schema.

11

Definitions Cont.
Primary Definitions
A transaction is a logical unit of work that
contains one or more SQL statements. A
transaction is an atomic unit. The effects
of all the SQL statements in a transaction
can be either all committed (applied to
the database) or all rolled back (undone
from the database), insuring data
consistency.
12

Definitions Cont.
Primary Definitions

A view is a selective presentation of the


structure of, and data in, one or more
tables (or other views). A view is a virtual
table, having predefined columns and
joins to one or more tables, reflecting a
specific facet of information.

13

Definitions Cont.
Primary Definitions
Database triggers are PL/SQL, Java, or C
procedures that run implicitly whenever a table
or view is modified or when some user actions
or database system actions occur. Database
triggers can be used in a variety of ways for
managing your database. For example, they can
automate data generation, audit data
modifications, enforce complex integrity
constraints, and customize complex security
authorizations. Trigger methodology differs
between databases.
14

Definitions Cont.
Primary Definitions
Replication is the process of copying and
maintaining database objects, such as tables, in
multiple databases that make up a distributed
database system.
Backups are copies of the database data in a
format specific to the database. Backups are
used to recover one or more files that have been
physically damaged as the result of a disk
failure. Media recovery requires the restoration
of the damaged files from the most recent
operating system backup of a database. It is of
the utmost importance to perform regularly
15
scheduled backups.

Definitions Cont.
Mission Critical Applications
An application is defined as mission critical,
imho, if
1. there are legal implications or financial loss to
the institution if the data is lost or unavailable.
2. there are safety issues if the data is lost or
unavailable.
3. no data loss can be tolerated.
4. uptime must be maximized (98%+).
16

Definitions Cont.
large or very large or a lot
Seems odd, but large is a hard definition to
determine. Vldb is an acronym for very large
databases. Its definition varies depending on
the database software one selects. Very large
normally indicates data that is reaching the
limits of capacity for the database software, or
data that needs extraordinary measures need to
be taken for operations such as backup,
recovery, storage, etc.
17

Definitions Cont.
Commercial databases do not a have a practical
limit to the size of the load. Issues will be
backup strategies for large databases.
Freeware does limit the size of the databases, and
the number of users. Documentation on these
issues vary widely from the freeware sites to the
user sites. Mysql supposedly can support 8T and
100 users. However, you will find arguments on
the users lists that these numbers cannot be
met.
18

Selecting a DBMS
Many options, many decisions, planning,
costs, criticality.
For lots of good information, please refer to
the urls on the last slides. Many examples
of people choosing product.

19

Selecting a DBMS
How do I Choose?
Which database product is appropriate for my
application? You must make a requirements
assessment.
Does you database need 24x7 availability?
Is your database mission critical, and no data loss can
be tolerated?
Is your database large? (backup recovery methods)
What data types do I need? (binary, large objects?)
Do I need replication? What level of replication is
required? Read only? Read/Write? Read/Write is
very expensive, so can I justify it?
20

Selecting a DBMS
How do I Choose? Cont.
If your answer to any of the above is yes, I would
strongly suggest purchasing and using a commercial
database with support. Support includes:
24x7 assistance with technical issues
Patches for bugs and security
The ability to report bugs, and get them resolved in
a timely manner.
Priority for production issues
Upgrades/new releases
Assistance with and use of proven backup/recovery
methods
21

Selecting a DBMS
The Freeware Choice
Freeware is an alternative for applications.
However, be fore warned, support for
these databases is done via email to a ad
hoc support group. The level of support
via these groups may vary over the life of
your database. Be prepared. Also expect
less functionality than any commercial
product. See http://wwwcss.fnal.gov/dsg/external/freeware/
22

Selecting a DBMS
The Freeware Choice
Freeware is free.
Freeware is open source.
Freeware functionality is improving.
Freeware is good for smaller non-mission
critical applications.

23

Selecting an Application Layer


Again, planning takes center stage. In the end you
want stability and dependability.
How many users need access?
What will the security requirements be?
Are there software licensing issues that need
consideration?
Is platform portability a requirement?
Two tier or three tier architecture?

24

Selecting an Application Layer

Direct access to the database layer? (probably


should be avoided)
Are you replicating? How? Where? With what?
There are no utilities that will port data from 1
database to another (i.e., postgres to mysql). if
database portability is a requirement, an
independent code must be written to satisfy this
requirement.

25

Selecting an Application Layer


Cont.

Application maintenance issues


People availability, working with users as a team, talent,
and turnover? (historically a huge issue)
A known or common language?
Freeware? Bug fixes, patchesare they important and
timely?
Documentation? Set standards, procedures, code
reviews making sure the documentation exists and is
clear.
Is the application flexible enough to easily accommodate
business rule changes that mandate modifications?
The availability of an ER diagram at this stage is
invaluable. We consider it a must have.
There are no utilities to port data from 1 type of db to 26

Selecting an Application Layer


Misc. application definitions
This presentation is not an application
presentation, but I will mention a few terms you
may hear.
Sql the query language for relational databases. A
must learn.
ODBC, open database connectivity. The software
that allows a database to talk to an application.
JDBC, java database connectivity.
27

Relational Design
The design of the application schema will
determine the usability and query ability
of the application. Done incorrectly, the
application and users will suffer until
someone else is forced to rewrite it.

28

Relational Design
The Setup
The database group has a standard 3 tier
infrastructure for developing and deploying
production databases and applications. This
infrastructure provides 3 database instances,
development, integration and production. This
infrastructure is applicable to any application
schema, mission critical or not. It is designed
to insure development, testing, feedback,
signoff, and an protected production
environment.
Each of these instances contain 1 or more
applications.
29

Relational Design
The Setup
The 3 instances are used as follows:
1. Development instance. Developers
playground. Small in size compared to
production. Much of the data is
invented and input by the developers.
Usually there is not enough disk space to
ever refresh with production data.
30

Relational Design Cont.


The Setup
2. The integration instance is used for
moving what is thought to be complete
functionality to a pre production
implementation. Power users and
developers work in concert in integration
to make sure the specs were followed.
The users should use integration as their
sign off area. Cuts from dev to int are
frequent and common to maintain the
newest releases in int for user testing.
31

Relational Design Cont.


The Setup
3. The production instance, real data. Needs
to be kept pure. NO testing allowed. Very
few logons. The optimal setup of a
production database server machine has
~3 operating system logons, root, the
database logon (ie oracle), and a
monitoring tool. In a critical 24x7
supported database, developers,
development tools, web servers, log files,
all should be kept off the production
32
database server.

Relational Design Cont.


The Setup
Lets talk about mission critical & 24x7 a bit.
1.
To optimize a mission critical 24/7 database, the
database server machine should be dedicated to
running the database, nothing else.
2.
All software products need maintenance and downtime.
Resist putting software products on the db server
machine so that their maintenance does not inhibit the
running of the database. Further, if the product
breaks, it could inhibit access to the database for a
long period. Example, a logging application, monitoring
users on the db goes wild, fills all available space and
halts the database. If this logging app. were not on
the dbserver machine, the db would be unaffected by
the malfunction.
33

Relational Design Cont.


The Setup
3. All database applications and database software require
modifications. Most times these modification require
down time because the schema or data modifications
need to lock entire tables exclusively. If you are sharing
your database instance with other many other
applications, and 1 of those applications needs the
database for an upgrade, all apps may have to take the
down time. Avoid this by insuring your 24/7 database
application is segregated from all other software that is
not absolutely needed. In that way you insure any down
times are specific to your cause.

34

Our 1st relational example

A cpu can
house
1 or more
databases

CPU
(d0ora2)
An database can
accommodate 1 or
more instances

Databases
on d0ora2
(d0ofprd1,
d0ofint1)

An instance may
contain 1 or more
application
schemas

schema
applications in
d0ofprd1
(sam, runs, calib)
schema
applications in
d0ofint1
(sam, runs, calib)

35

What is a schema?
It is

It is not

Tables (columns/datatypes) having


Constraints (not null, unique, foreign &
primary keys)
Triggers
Indexes
etc.
Accounts
Privileges & Roles
Server side processes

The

environment (servers, OS)


The results of queries, I.e objects
Application Code

One implements a schema by running scripts.


These scripts can be run against multiple
servers and should be archived.
36

Relational Design
Getting Started
Using your design tool, you will begin by relating objects that will
eventually become tables. All the other schema objects will fall out
of this design.
You will spend LOADS of time in your design tool, honing, redoing,
reacting to modifications, etc.
The end users and the designers need to be working almost at the
same desk for this process. If the end user is the designer, the end
user should involve additional users to insure an unbiased and
general design.
It is highly suggested that the design be kept up to date for future
documentation and maintainers.
Tables are related, most frequently in a 0 to many relationship.
Example, 1 run will result in 0 or more events. Analyzing and
defining these relationships results in an application schema.

37

What will a good schema design


buy you?
I am afraid the 80% planning 20% implementation
rule applies. Gather requirements.
Discovery of data that needs to be gathered.
Fast query results
Limited application code maintenance
Data flexibility
Less painful turnover of application to new
maintainers.
Fewer long term maintenance issues.
38

Relational Design
Lets get started
Write a requirements document.
You will not be able to anticipate all requirements,
but a document will be a start. A well designed
schema naturally allows for additional
functionality.
Who are the users? What is their mission?
Identify objects that need to be stored/tracked.
Think about how objects relate to each other.
Do not be afraid to argue/debate the relationships
with others.
39

Relational Design
So how do you get there?
Design tools are available, however, they do not think for
you. They will give you a clue that you are doing
something stupid, but it wont stop you. It is highly
recommended you use a design tool.
A picture says 1000 words. Create ER, entity relationship,
diagrams.
Get a commitment from the developer(s) to see the
application through to implementation. We have seen
several applications redone multiple times. A string of
developers tried, left the project, and left a mess. A new
developer started from scratch because there was no
documentation or design.
40

Relational Design
How do I get there?
Adhere to the recommendations of your database
vendor for setup and architecture.
Dont be afraid to ask for help or to see other
examples.
Dont be afraid to pilfer others design work, if it is
good, if it closely fits your requirements, then use it.
Ask questions, schedule reviews with experts and
users.
Work with your hardware system administrators to
insure you have the hardware you need for the
proposed job to be done.
41

Relational Design
Common Mistakes
Mistakes we see ALL the time

Do not design your schema around your favorite


query. A relational design will enable all queries
to be speedy, not only your favorite.
Dont design the schema around your narrow
view of the application. Get other users involved
from the start, ask for input and review.

42

Relational Design
Common Mistakes

Create a relational structure, not a


hierarchical structure. The ER diagram
should not necessarily resemble a tree
or a circle. It is the logical building of
relationships between data.
Relationships flow between subsets of
data. The resulting ER diagrams look
is not a standard by which one can
judge the quality of the design.
43

Relational Design
Common Mistakes

Do not create 1 huge table to hold 99% of


the data. We have seen a table with 1100+
columnsunusable, unqueryable, required
an entire application rewrite, took over a
year, made 80 tables from the 1 table.
Do not create separate schemas for the
same application or functions within an
application.
Use indices and constraints, this is a MUST!
44

Relational Design
Examples of Common Mistakes

Using timestamp as the primary key assumes


that within a second, no other record will be
inserted. Actually this was not the case, and an
insert operation failed. Use database generated
sequences as primary keys and NON-UNIQUE
index on timestamp.
A table with more than 900 columns. Such
design will cause chaining since each record is
not going to fit in one block. One record
spanning many blocks, thus chaining, hence bad
performance.
45

Relational Design
Examples of Common Mistakes

Do not let the application control a generated


sequence. Have seen locking issues, and
duplicate values issues when the application
increments the sequence. Have the database
increment/lock/constrain the sequence/primary
key. That is why the databases have sequence
mechanisms, use them.
Use indices! An Atlas table with 200,000 rows,
halted during a query. Reason? No indices.
Added a primary key index, instantaneous query
response. Indices are not wasted space!
46

Relational Design
Examples of Common Mistakes
USE DATABASE CONSTRAINTS!!!!!!
Have examples where constraints were not used,
but implemented via the api. Bugs in the api
allowed data to be deleted that should not have
been deleted, and constraints would have
prevented the error. Have also seen apis error
with cannot delete errors. They were trying to
force an invalid delete, luckily the database
constraints saved the data.
47

Entity Relationship Diagrams


1 to many
PARENT

have

# PARENT_ ID

belong to

# E_ ID

B
# B_ ID

belong to
have

# C_ ID

# CHILD_ID

have

# A_ ID

CHILD

D
belong to

have

# D_ ID

F
belong to

# F_ ID

48

Entity Relationship Diagrams


many to many
define

# G_ID

define

G2
# G2_ID

G2H2

# I2 _ID

map to

map to
define

I2J2
map to

# H2 _ID

J
relate to

define

H2

define

# I_ ID

I2

# H_ ID

owned by

# J_ID

map to

J2
define

# J2_ ID

49

Entity Relationship Diagrams


1 to 1
K

define

# K_ ID

relate to

define

# M _ID

O
# O_ID

L
# L_ ID

N
relate to

define

# N_ ID

P
relate to

# P_ ID

50

Relational Design
The Good

CALIB_TYPE

CALIBRATION

# CALIB_T YPE_ ID
* DESCRIPTION

# CALIBRATION_ID
* T START
o TEND

define
be defined by

Calibration type might have 3


rows, drift, pedestal, & gain
This is a parent table.

Each calibration record will be


Defined by drift, pedestal or gain.
In addition to start and end times.
This is a child table.

51

Relational Design
The Bad
CALIBRATION
# CALIBRAT ION_ID
* T START
o TEND

define
relate to

define

relate to

define
relate to

DRIFT_CALIB

PEDESTAL_CALIB

GAIN_CALIB

# DRIFT _CAL IB_ID


* T START
o TEND

# PEDESTAL_ CALIB_ID
* T START
o TEND

# GAIN_CAL IB_ID
* T START
o TEND

You have now created 3 different children, all reporting the same information, when 1 child would
suffice. Code will have to be written, tested, and maintained for 4 tables now instead of 2.
52

Relational Design
The Ugly
CALIBRATION

CALIBRATION(2)

# CALIBRATION_ID
* T START
o TEND

# CALIBRATION_ID
* T START
o TEND

defines

relate to
PEDESTAL_CALIB
# PEDESTAL_ CALIB_ID
* T START
o TEND

CALIBRATION(3)
# CALIBRATION_ID
* T START
o TEND

defines

relate to

defines
relate to

DRIFT_CALIB

GAIN_CALIB

# DRIFT _CAL IB_ID


* T START
o TEND

# GAIN_CALIB_ID
* T START
o TEND

Now you have created 3 different applications, using 6 tables. All of which could be managed with 2 tables.
Extra code, extra testing, extra maintenance.
53

Relational Design
The Goodlets recap

CALIB_TYPE

CALIBRATION

# CALIB_T YPE_ ID
* DESCRIPTION

# CALIBRATION_ID
* T START
o TEND

define
be defined by

AHHH, back to normal, or normalization as we refer to it.

54

Relational Design
What to expect from a design tool

An entity relationship diagram


The ability to create the ddl (data
definition language) needed
The ability to project disk space usage
Ddl in a format to allow you to enter the
code into a code library (cvs), and that will
allow you to run against your database
55

Relational Design Why bother?


Experience from RunII
TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES!
Personnel consistency does not exist. Application
developers come and go regularly. The
documentation that a design product provides will
the next developer an immediate understanding of
the application in picture format.
Application sharing is enhanced when others can look
at your design and determine whether the
application is reusable in their environment. Sam is
a good example of an application that 3 experiments
are now using.
56

Relational Design
Why bother? Cont.
When an application is under construction,
the ER diagram goes to every application
meeting, and quite possibly the wallet of
the application leader. It is the pictorial
answer to many issues.
Planning for disk space has been an issue,
the designer tool should assist with this
task.
57

Planning
Overall
What do I need to plan for?
People, hardware, software, obsolescence,
maintenance, emergencies.
How far out do I need to plan?
Initially 2-4 years.
How often do I need to review the plans?
Annually.
What if my plan fails or looks undoable?
Nip it in the bud, be proactive, come up with
options.
58

Planning
Overall

Disk space requirements. My experience is all the


wags, (wild guesses) fall short of what is needed. It
is hard to predict the number of rows in a table. It
would be easier if we knew the amount and results
of the science ahead of time! Remember, 10x what
you think the data will take.
Hardware requirements. Experience tells us that the
database machine should serve 1 master (if it is a
large database or mission critical), the database,
nothing else. Ideally there will be root, a database
monitor user and a database user, oracle for
example. No apache, no log file areas, no
applications, etc.
59

Planning
Overall

Growth and obsolesce. Plan for 3-4 years before


needing to replace hardware. Hardware and
software become obsolete. New/upgraded software
gives addition functionality that you will want/need.
Maintenance. Do you change the oil in your car?
Plan on 1 morning per month downtime for caring
for the hardware and software. Security patches
could mandate additional stoppages. I cannot stress
how important this is. Fire walling will not protect
you from bugs and obsolescence. If the downtime is
not needed, it will not be taken. Planning
maintenance time is as important as planning to buy
60
disks.

Planning
User Requirements
Will user requirements influence your
hardware & software decisions?
Do you need replication?
What architecture is your api going to be?
How many users will be loading the
database and hardware?

61

Planning
Maintenance

Database/Operating system software need


upgrades. One always hopes one can get
on a stable version of something and not
upgrade. That is a fallacy. Major version
upgrades provide needed and new
functionality. Bug patches and security
patches are a never ending fact of life.

62

Planning
Backup and Recovery
Backup and recovery procedures of vldb
(very large databases) are difficult at best.
Vldb is normally defined as mulitple Gig or
tera byte databases. This is probably the
most sensitive area when choosing a
freeware database.
Hardware plays a part here as well. Insure
when planning for hardware there is plan
for backup and recovery. Disk and tape

63

Planning
Good Practices with a Hammer
Make a standards document and enforce its
use. When dbas and developers are
always on the same page, life is easier for
both. Expectations are clear and defined.
Anger and disappointment are lessened.
System as well as database standards need
to be followed and enforced.
64

Planning
Failover
Yikes, we are down!
Everyone always wants 24x7 scheduled uptime.
Until they see the cost.
Make anyone who insists on real 100% uptime to
justify it (and pay for it?). 98-99% uptime can
be realized at a much lower cost.
Uptime requirements will influence, possibly
dictate, database choices, hardware choices, fte
requirements.
65

Planning
Failover
The cheapest method of addressing a failure is proactive
planning.
Make sure your database and database software are
backed up. Unless you are using a commercial database
with roll forward recovery, assume you will lose all dml
since your last backup if you need to recover. This
should dictate your backup schedule.
Do not forget tape backups as a catastrophic recovery
method.
Practice recovery on your integration and development
databases. Practice different scenarios, delete a datafile,
delete the entire database.
66

Replication
Replication is the process of copying and
maintaining database objects in multiple
databases that make up a distributed
database system. Replication can improve
the performance and protect the
availability of applications because
alternate data access options exist.

67

Replication Cont.

Oracle Supports 3 types of replication READ ONLY Snapshots


(Materialized views), Advanced Replication and streams based
replication.
Streams allows ddl modifications made to the master
automatically.
Streams can be configured in uni-directional ( Single Source
and one or more than targets) or master to master where
updates can happen to any participant database.
Advanced replication also supports master to master . But
streams based replication is recommended.

READ ONLY Snapshots replication from a Sun box to a Sun &


Linux box(s) is being done in CDF. When a replica is under
maintenance there is failover to another replica. The replicas
are up and running in read only mode if the master is down
for maintenance.
68

Replication cont.
Oracle master to master replication allows for
updates on both the master and replica sides.
Master to master is a complex and a high
maintenance replication. It seems to be the 1st
option the unwitting opt for. Both Cern and
Fermi dbas have requested firm justification
before considering this type of replication
request.
Every link in the multi master would be required to
be a fully staffed, as downtime will be critical.
69

Replication cont.
1.

2.

3.

4.

Disk Space for Archives. If receiving site is down for extended


period of time, then source db should be tuned enough to
hold the archives logs, otherwise, one has to reinstantiate the
replication. Reasonable downtime for target depends upon
archive area being generated on source. Space, space and
more space.
Conflict Resolution In Master to Master, conflict resolution
may be challenge. Rules should be well defined to resolve the
data conflicts.
Design of Data Model if Primary Keys are populated by
sequences , there is very much chance of overlapping the
sequences and will cause integrity constraints. Data Model
should be designed very carefully.
DB Support In Master to Master Replication, all master sites
should be in 24*7 support mode. Otherwise , sync up of data
will be challenge or one may lead to reinstantiation of
replication. Reinstantiation is not unplug and play type of
70
situation.

Freeware Replication
MySQL has replication in the last stable
version (3.23.32, v4.1 is out). It is masterslave replication using binary log of
operations on the server side. It is
possible to build star or chain type
structures.
There is a PostgreSQL replication tool. We
have not tested it yet.
71

Lost in Space
Space is the 1 area consistently under estimated in
every application I have seen. Imho,
consistently, data volume initial estimates were
undersized by a factor of 2 or 3. For example,
RunII events were estimated at 1 billion rows.
This estimate was surpassed Feb. 2004. We will
probably end up with 4-5 billion event rows.
That is a lot of disk space.
Disk hardware becomes unsupported, and
obsolete in what seems to be a blink of an eye.
72

Lost In Space cont.


8 x N Gb

N Gb

Unexpected?

Rollback
Indexuse disk
Redo
AllData
databases
to store
data.

Data

Index

mirror

mirror

Backup

Replication

Good rule of thumb:


You need 10x the disk to hold a given amount of data in an RDB.
Operate in 2 year cycles:
First 2 years storage available on day 1.
Evaluate growth at end of year 1, begin prep of next 2 yr.
73

Lost in Space, cont.


You will use as much disk space as you
purchase, and then some.
Database indices will take MINIMALLY at
least as much space as the tables,
probably considerably more.
Give WIDE lead time to purchase disk
storage. New disks are not installed and
configured over night. They require
planning, downtime and $.
74

Additional References
**WARNING some of these may be database specific.
Intro to database design
http://www.cc.gatech.edu/classes/AY2000/cs4400_sprin
g/cs4400a/
Intro to Oracle tutorial
http://w2.syronex.com/jmr/edu/db/
Evolutionary Database Design
http://www.martinfowler.com/articles/evodb.html
mentions 1 dba for atlas
Sql course http://sqlcourse.com/

75

Additional References

***Highly recommended reading, db


comparatives http://wwwcss.fnal.gov/dsg/external/freeware/
db infrastructure standard, support levels,
etc. for fermi computing http://wwwcss.fnal.gov/dsg/external/oracle_admin/
76

Additional References

Oracle Designer tutorial http://wwwcss.fnal.gov/dsg/internal/ora_adm/index.htm#d


esigner (choose Oracle Designer tutorial or
Oracle Designer Short Cuts and Lessons
Learned)
Btev specific additional information

http://wwwcss.fnal.gov/dsg/external/BTeV/index.html
77

You might also like