Professional Documents
Culture Documents
Outline
Definitions
Selecting a dbms
Selecting an application layer
Relational Design
Planning
A very few words about Replication
Space
2
Definitions
What is a database?
A database is the implementation of freeware or
commercial software that provides a means to
organize and retrieve data. The database is the
set of physical files in which all the objects and
database metadata are stored. These files can
usually be seen at the operating system level.
This talk will focus on the organize aspect of
data storage and retrieval.
Commercial vendors include MicroSoft and Oracle.
Freeware products include mysql and postgres.
For this discussion, all points/issues apply to both
commercial and freeware products.
3
Definitions
Instance
A database instance, or an instance is
made up of the background processes
needed by the database software.
These processes usually include a process
monitor, session monitor, lock monitor,
etc. They will vary from database vendor
to database vendor.
4
Definitions
What is a schema?
Definitions Cont.
Primary Definitions
Definitions Cont.
Primary Definitions
Definitions Cont.
Primary Definitions
Entity Relationship Diagram or ER is a
pictorial representation of the application
schema.
Er Example
REVISION
MODULE
STATUS
# ST A T_ ID
o ST A TUS_ NA M E
* CREA TE _DA TE
* CREA TE _US ER
...
describes
has
# M ODULE _ID
* M ODUL E_ NA M E
* CREA TE _DA TE
* CREA TE _US ER
...
have
may have
associated with
part of
OWNER
PARAMETER
# PA R_ID
* P A R_ NA M E
* T E XT
* V A LUE
o UP P ER_L IM IT
o LOWE R_ LIM IT
* S RC
* DOCUM E NT A TION
o DRA WINGS
* CREA TE _DA TE
* CREA TE _US ER
...
creates
RE V _ID
RE V _NAM E
RE V_ DA TE
CREA TE _DA TE
CREA TE _US ER
UP DAT E_ DA T E
UP DAT E_ US E R
associated with
own
# OWNE R_ID
* FIRS T_ NA M E
* L AS T _NAM E
* P A SS WORD
* E M A IL
* US ERNA M E
* CREA TE _DA TE
* CREA TE _US ER
...
#
o
*
*
*
o
o
UNIT
have
describes
#
*
*
*
*
o
UNIT_ ID
UNIT _NAM E
CREA TE _DA TE
CREA TE _US ER
UP DA TE _DA TE
UP DAT E_ US E R
has
has
describes
HISTORY
#
*
*
*
*
o
o
HIS T_ ID
DA TE _CHA NGE D
RE AS ON
CREA TE _DA TE
CREA TE _US ER
UP DAT E_ DA T E
UP DAT E_ US E R
Definitions Cont.
Primary Definitions
Constraints are rules residing in the
databases data dictionary governing
relationships and dictating the ways
records are manipulated, what is a
legal move vs. what is an illegal
move. These are of the utmost
importance for a secure and
consistent set of data.
10
Definitions Cont.
Primary Definitions
Data Manipulation Language or DML,
sql statements that insert, update or
delete database in a database.
Data Definition Language or DDL, sql
used to create and modify database
objects used in an application schema.
11
Definitions Cont.
Primary Definitions
A transaction is a logical unit of work that
contains one or more SQL statements. A
transaction is an atomic unit. The effects
of all the SQL statements in a transaction
can be either all committed (applied to
the database) or all rolled back (undone
from the database), insuring data
consistency.
12
Definitions Cont.
Primary Definitions
13
Definitions Cont.
Primary Definitions
Database triggers are PL/SQL, Java, or C
procedures that run implicitly whenever a table
or view is modified or when some user actions
or database system actions occur. Database
triggers can be used in a variety of ways for
managing your database. For example, they can
automate data generation, audit data
modifications, enforce complex integrity
constraints, and customize complex security
authorizations. Trigger methodology differs
between databases.
14
Definitions Cont.
Primary Definitions
Replication is the process of copying and
maintaining database objects, such as tables, in
multiple databases that make up a distributed
database system.
Backups are copies of the database data in a
format specific to the database. Backups are
used to recover one or more files that have been
physically damaged as the result of a disk
failure. Media recovery requires the restoration
of the damaged files from the most recent
operating system backup of a database. It is of
the utmost importance to perform regularly
15
scheduled backups.
Definitions Cont.
Mission Critical Applications
An application is defined as mission critical,
imho, if
1. there are legal implications or financial loss to
the institution if the data is lost or unavailable.
2. there are safety issues if the data is lost or
unavailable.
3. no data loss can be tolerated.
4. uptime must be maximized (98%+).
16
Definitions Cont.
large or very large or a lot
Seems odd, but large is a hard definition to
determine. Vldb is an acronym for very large
databases. Its definition varies depending on
the database software one selects. Very large
normally indicates data that is reaching the
limits of capacity for the database software, or
data that needs extraordinary measures need to
be taken for operations such as backup,
recovery, storage, etc.
17
Definitions Cont.
Commercial databases do not a have a practical
limit to the size of the load. Issues will be
backup strategies for large databases.
Freeware does limit the size of the databases, and
the number of users. Documentation on these
issues vary widely from the freeware sites to the
user sites. Mysql supposedly can support 8T and
100 users. However, you will find arguments on
the users lists that these numbers cannot be
met.
18
Selecting a DBMS
Many options, many decisions, planning,
costs, criticality.
For lots of good information, please refer to
the urls on the last slides. Many examples
of people choosing product.
19
Selecting a DBMS
How do I Choose?
Which database product is appropriate for my
application? You must make a requirements
assessment.
Does you database need 24x7 availability?
Is your database mission critical, and no data loss can
be tolerated?
Is your database large? (backup recovery methods)
What data types do I need? (binary, large objects?)
Do I need replication? What level of replication is
required? Read only? Read/Write? Read/Write is
very expensive, so can I justify it?
20
Selecting a DBMS
How do I Choose? Cont.
If your answer to any of the above is yes, I would
strongly suggest purchasing and using a commercial
database with support. Support includes:
24x7 assistance with technical issues
Patches for bugs and security
The ability to report bugs, and get them resolved in
a timely manner.
Priority for production issues
Upgrades/new releases
Assistance with and use of proven backup/recovery
methods
21
Selecting a DBMS
The Freeware Choice
Freeware is an alternative for applications.
However, be fore warned, support for
these databases is done via email to a ad
hoc support group. The level of support
via these groups may vary over the life of
your database. Be prepared. Also expect
less functionality than any commercial
product. See http://wwwcss.fnal.gov/dsg/external/freeware/
22
Selecting a DBMS
The Freeware Choice
Freeware is free.
Freeware is open source.
Freeware functionality is improving.
Freeware is good for smaller non-mission
critical applications.
23
24
25
Relational Design
The design of the application schema will
determine the usability and query ability
of the application. Done incorrectly, the
application and users will suffer until
someone else is forced to rewrite it.
28
Relational Design
The Setup
The database group has a standard 3 tier
infrastructure for developing and deploying
production databases and applications. This
infrastructure provides 3 database instances,
development, integration and production. This
infrastructure is applicable to any application
schema, mission critical or not. It is designed
to insure development, testing, feedback,
signoff, and an protected production
environment.
Each of these instances contain 1 or more
applications.
29
Relational Design
The Setup
The 3 instances are used as follows:
1. Development instance. Developers
playground. Small in size compared to
production. Much of the data is
invented and input by the developers.
Usually there is not enough disk space to
ever refresh with production data.
30
34
A cpu can
house
1 or more
databases
CPU
(d0ora2)
An database can
accommodate 1 or
more instances
Databases
on d0ora2
(d0ofprd1,
d0ofint1)
An instance may
contain 1 or more
application
schemas
schema
applications in
d0ofprd1
(sam, runs, calib)
schema
applications in
d0ofint1
(sam, runs, calib)
35
What is a schema?
It is
It is not
The
Relational Design
Getting Started
Using your design tool, you will begin by relating objects that will
eventually become tables. All the other schema objects will fall out
of this design.
You will spend LOADS of time in your design tool, honing, redoing,
reacting to modifications, etc.
The end users and the designers need to be working almost at the
same desk for this process. If the end user is the designer, the end
user should involve additional users to insure an unbiased and
general design.
It is highly suggested that the design be kept up to date for future
documentation and maintainers.
Tables are related, most frequently in a 0 to many relationship.
Example, 1 run will result in 0 or more events. Analyzing and
defining these relationships results in an application schema.
37
Relational Design
Lets get started
Write a requirements document.
You will not be able to anticipate all requirements,
but a document will be a start. A well designed
schema naturally allows for additional
functionality.
Who are the users? What is their mission?
Identify objects that need to be stored/tracked.
Think about how objects relate to each other.
Do not be afraid to argue/debate the relationships
with others.
39
Relational Design
So how do you get there?
Design tools are available, however, they do not think for
you. They will give you a clue that you are doing
something stupid, but it wont stop you. It is highly
recommended you use a design tool.
A picture says 1000 words. Create ER, entity relationship,
diagrams.
Get a commitment from the developer(s) to see the
application through to implementation. We have seen
several applications redone multiple times. A string of
developers tried, left the project, and left a mess. A new
developer started from scratch because there was no
documentation or design.
40
Relational Design
How do I get there?
Adhere to the recommendations of your database
vendor for setup and architecture.
Dont be afraid to ask for help or to see other
examples.
Dont be afraid to pilfer others design work, if it is
good, if it closely fits your requirements, then use it.
Ask questions, schedule reviews with experts and
users.
Work with your hardware system administrators to
insure you have the hardware you need for the
proposed job to be done.
41
Relational Design
Common Mistakes
Mistakes we see ALL the time
42
Relational Design
Common Mistakes
Relational Design
Common Mistakes
Relational Design
Examples of Common Mistakes
Relational Design
Examples of Common Mistakes
Relational Design
Examples of Common Mistakes
USE DATABASE CONSTRAINTS!!!!!!
Have examples where constraints were not used,
but implemented via the api. Bugs in the api
allowed data to be deleted that should not have
been deleted, and constraints would have
prevented the error. Have also seen apis error
with cannot delete errors. They were trying to
force an invalid delete, luckily the database
constraints saved the data.
47
have
# PARENT_ ID
belong to
# E_ ID
B
# B_ ID
belong to
have
# C_ ID
# CHILD_ID
have
# A_ ID
CHILD
D
belong to
have
# D_ ID
F
belong to
# F_ ID
48
# G_ID
define
G2
# G2_ID
G2H2
# I2 _ID
map to
map to
define
I2J2
map to
# H2 _ID
J
relate to
define
H2
define
# I_ ID
I2
# H_ ID
owned by
# J_ID
map to
J2
define
# J2_ ID
49
define
# K_ ID
relate to
define
# M _ID
O
# O_ID
L
# L_ ID
N
relate to
define
# N_ ID
P
relate to
# P_ ID
50
Relational Design
The Good
CALIB_TYPE
CALIBRATION
# CALIB_T YPE_ ID
* DESCRIPTION
# CALIBRATION_ID
* T START
o TEND
define
be defined by
51
Relational Design
The Bad
CALIBRATION
# CALIBRAT ION_ID
* T START
o TEND
define
relate to
define
relate to
define
relate to
DRIFT_CALIB
PEDESTAL_CALIB
GAIN_CALIB
# PEDESTAL_ CALIB_ID
* T START
o TEND
# GAIN_CAL IB_ID
* T START
o TEND
You have now created 3 different children, all reporting the same information, when 1 child would
suffice. Code will have to be written, tested, and maintained for 4 tables now instead of 2.
52
Relational Design
The Ugly
CALIBRATION
CALIBRATION(2)
# CALIBRATION_ID
* T START
o TEND
# CALIBRATION_ID
* T START
o TEND
defines
relate to
PEDESTAL_CALIB
# PEDESTAL_ CALIB_ID
* T START
o TEND
CALIBRATION(3)
# CALIBRATION_ID
* T START
o TEND
defines
relate to
defines
relate to
DRIFT_CALIB
GAIN_CALIB
# GAIN_CALIB_ID
* T START
o TEND
Now you have created 3 different applications, using 6 tables. All of which could be managed with 2 tables.
Extra code, extra testing, extra maintenance.
53
Relational Design
The Goodlets recap
CALIB_TYPE
CALIBRATION
# CALIB_T YPE_ ID
* DESCRIPTION
# CALIBRATION_ID
* T START
o TEND
define
be defined by
54
Relational Design
What to expect from a design tool
Relational Design
Why bother? Cont.
When an application is under construction,
the ER diagram goes to every application
meeting, and quite possibly the wallet of
the application leader. It is the pictorial
answer to many issues.
Planning for disk space has been an issue,
the designer tool should assist with this
task.
57
Planning
Overall
What do I need to plan for?
People, hardware, software, obsolescence,
maintenance, emergencies.
How far out do I need to plan?
Initially 2-4 years.
How often do I need to review the plans?
Annually.
What if my plan fails or looks undoable?
Nip it in the bud, be proactive, come up with
options.
58
Planning
Overall
Planning
Overall
Planning
User Requirements
Will user requirements influence your
hardware & software decisions?
Do you need replication?
What architecture is your api going to be?
How many users will be loading the
database and hardware?
61
Planning
Maintenance
62
Planning
Backup and Recovery
Backup and recovery procedures of vldb
(very large databases) are difficult at best.
Vldb is normally defined as mulitple Gig or
tera byte databases. This is probably the
most sensitive area when choosing a
freeware database.
Hardware plays a part here as well. Insure
when planning for hardware there is plan
for backup and recovery. Disk and tape
63
Planning
Good Practices with a Hammer
Make a standards document and enforce its
use. When dbas and developers are
always on the same page, life is easier for
both. Expectations are clear and defined.
Anger and disappointment are lessened.
System as well as database standards need
to be followed and enforced.
64
Planning
Failover
Yikes, we are down!
Everyone always wants 24x7 scheduled uptime.
Until they see the cost.
Make anyone who insists on real 100% uptime to
justify it (and pay for it?). 98-99% uptime can
be realized at a much lower cost.
Uptime requirements will influence, possibly
dictate, database choices, hardware choices, fte
requirements.
65
Planning
Failover
The cheapest method of addressing a failure is proactive
planning.
Make sure your database and database software are
backed up. Unless you are using a commercial database
with roll forward recovery, assume you will lose all dml
since your last backup if you need to recover. This
should dictate your backup schedule.
Do not forget tape backups as a catastrophic recovery
method.
Practice recovery on your integration and development
databases. Practice different scenarios, delete a datafile,
delete the entire database.
66
Replication
Replication is the process of copying and
maintaining database objects in multiple
databases that make up a distributed
database system. Replication can improve
the performance and protect the
availability of applications because
alternate data access options exist.
67
Replication Cont.
Replication cont.
Oracle master to master replication allows for
updates on both the master and replica sides.
Master to master is a complex and a high
maintenance replication. It seems to be the 1st
option the unwitting opt for. Both Cern and
Fermi dbas have requested firm justification
before considering this type of replication
request.
Every link in the multi master would be required to
be a fully staffed, as downtime will be critical.
69
Replication cont.
1.
2.
3.
4.
Freeware Replication
MySQL has replication in the last stable
version (3.23.32, v4.1 is out). It is masterslave replication using binary log of
operations on the server side. It is
possible to build star or chain type
structures.
There is a PostgreSQL replication tool. We
have not tested it yet.
71
Lost in Space
Space is the 1 area consistently under estimated in
every application I have seen. Imho,
consistently, data volume initial estimates were
undersized by a factor of 2 or 3. For example,
RunII events were estimated at 1 billion rows.
This estimate was surpassed Feb. 2004. We will
probably end up with 4-5 billion event rows.
That is a lot of disk space.
Disk hardware becomes unsupported, and
obsolete in what seems to be a blink of an eye.
72
N Gb
Unexpected?
Rollback
Indexuse disk
Redo
AllData
databases
to store
data.
Data
Index
mirror
mirror
Backup
Replication
Additional References
**WARNING some of these may be database specific.
Intro to database design
http://www.cc.gatech.edu/classes/AY2000/cs4400_sprin
g/cs4400a/
Intro to Oracle tutorial
http://w2.syronex.com/jmr/edu/db/
Evolutionary Database Design
http://www.martinfowler.com/articles/evodb.html
mentions 1 dba for atlas
Sql course http://sqlcourse.com/
75
Additional References
Additional References
http://wwwcss.fnal.gov/dsg/external/BTeV/index.html
77