You are on page 1of 16

Key. A key is one or more data attributes that uniquely identify an entity.

In a physical database
a key would be formed of one or more table columns whose value(s) uniquely identifies a row
within a relational table.
Composite key. A key that is composed of two or more attributes.
Natural key. A key that is formed of attributes that already exist in the real world. For example,
U.S. citizens are issued a Social Security Number (SSN) that is unique to them (this isn't
guaranteed to be true, but it's pretty darn close in practice). SSN could be used as a natural key,
assuming privacy laws allow it, for a Person entity (assuming the scope of your organization is
limited to the U.S.).
Surrogate key. A key with no business meaning.
Candidate key. An entity type in a logical data model will have zero or more candidate keys,
also referred to simply as unique identifiers (note: some people don't believe in identifying
candidate keys in LDMs, so there's no hard and fast rules). For example, if we only interact with
American citizens then SSN is one candidate key for the Person entity type and the combination
of name and phone number (assuming the combination is unique) is potentially a second
candidate key. Both of these keys are called candidate keys because they are candidates to be
chosen as the primary key, an alternate key or perhaps not even a key at all within a physical
data model.
Primary key. The preferred key for an entity type.
Alternate key. Also known as a secondary key, is another unique identifier of a row within a
table.
Foreign key. One or more attributes in an entity type that represents a key, either primary or
secondary, in another entity type.
A degenerate dimension is data that is dimensional in nature but stored in a fact table. For example, if you
have a dimension that only has Order Number and Order Line Number, you would have a 1:1 relationship
with the Fact table. Do you want to have two tables with a billion rows or one table with a billion rows.
Therefore, this would be a degenerate dimension and Order Number and Order Line Number would be
stored in the Fact table. This eliminates the need to join to a dimension table. You
can use the data in the degenerate dimension to limit or 'slice and dice' your
fact table measures.
degenerate dimension is a dimension key in the fact table that does not have its own dimension table.
Degenerate dimensions commonly occur when the fact table's grain is a single transaction (or
transaction line). Transaction control header numbers assigned by the operational business
process are typically degenerate dimensions, such as order, ticket, credit card transaction, or
check numbers. These degenerate dimensions are natural keys of the "parents" of the line items.
Even though there is no corresponding dimension table of attributes, degenerate dimensions can
be quite useful for grouping together related fact tables rows. For example, retail point-of-sale
transaction numbers tie all the individual items purchased together into a single market basket. In
health care, degenerate dimensions can group the claims items related to a single hospital stay or
episode of care.
----------------------------
Any column or a combination of columns that can uniquely identify a record is a candidate key. A primary
key uniquely defines a row on a table. Some primary
keys can be described as a surrogate key as well. A primary key
is called a surrogate key if the value was generated
programmatically.

Primary keys are used in OLTP whereas surrogate keys are used in OLAP
schemas.
3. Primary keys hold some business meaning whereas surrogate does not hold
any business meaning.
4. Primary key may contain numeric as well as non-numeric values whereas
surrogate keys contain only (simple)numeric values.

Now question comes that why do we use surrogate keys in OLAP rather then
using primary keys?
Answer is very simple. There are two main reasons for this:-

1. Surrogate keys are simple numeric values, as simple as normal counting.
So most of the time they save storage space.
2. As surrogate keys are simple and short, it speed-up the join performance.
3. Best thing is that same pattern of surrogate keys can be used across all
the tables present in a star/schema.
Executive Summary:
Dimensional modeling is made up of both logical and physical modeling. Measures are the core of the dimensional model and are
data elements that can be summed, averaged, or mathematically manipulated. Fact tables are at the center of the star schema
(which is a physically implemented dimensional model), and data marts are made up of multiple fact tables.

In "Discover the Star Schema," July 2007, InstantDoc ID 96112, I reviewed the basics of dimensional modeling. You
can use dimensional modeling to create an enterprise data warehouse that makes it easier for business intelligence
(BI) end users to access and understand data. Now let's examine the differences between entity relationship (ER)
and dimensional models, the components that make up the dimensional model, and why those pieces are designed
the way they are.
Dimensional vs. ER Modeling
Dimensional modeling is a logical design technique. Unlike ER modeling, which consists of conceptual, logical, and
physical data modeling, dimensional modeling is made up of only logical and physical modeling.
There are sharp contrasts between ER and dimensional modeling. ER modeling is a design discipline that seeks to
represent business rules as highly detailed relationships between business elements that are materialized as
database tables. You can extrapolate the business rules from the types and cardinalities of the relationships in an ER
model. The primary goal of ER modeling is to remove all non-key data redundancy.
Dimensional modeling, however, seeks to represent data in a logical, understandable manner. In dimensional
modeling, you can control data redundancy by conforming dimension and fact tables. A table that's been conformed
can be used in more than one dimensional data model and is the same no matter how it's used. The relationships in a
dimensional model don't represent business rules; instead, they're navigational paths used to help write reports or
create graphs.
Many data modeling software packages support dimensional modeling. Some even let you generate SQL Data
Definition Language (DDL) scripts so that all you have to do to create a data warehouse/data mart is run those
scripts.

Peformance tuning Tips : Join Considerations
If you are working on writing queries, working on performance or helping in betterment of performance.
You will have to take sometime in going through this topic. It is all to do about Joins which is most
important concern in Teradata.

If some light is given to following suggestions, any join related issues can be taken care off...

Tip 1: Joining on PI/NUPI/ Non PI columns

We should make sure join is happening on columns composed of UPI/NUPI. But why??

Whenever we join two tables on common columns, the smart optimizer will try to take data from both the
data into a common spool space and join them to get results. But getting data from both the tables into
common spool has overhead.

What if I joined a very large table with small table?
Should small table be redistributed or large table?
Should small table be duplicated across all the AMPs?
Should both the tables be redistributed across all the AMPs??

Here is some basic thumb rules on joining columns on Index, so joining happens faster.

Case 1 - P.I =P.I joins
There is no redistribution of data over amp's. Since amp local joins happen as data are present in same
AMP and need not be re-distributed. These types of joins on unique primary index are very fast.

Case 2 - P.I =Non PI column joins
-Data from second table will be re-distributed on all amps since joins are happening on PI vs. NUPI
column. Ideal scenario is when small table is redistributed to be joined with large table records on same
amp
-Data in small table is duplicated to Every AMP where it is joined locally with large table

Case 3 - No Index =Non PI column joins
Data from both the tables are redistributed on all AMPs. This is one of the longest processing queries ,
Care should be taken to see that stats are collected on these columns


Tip 2: The columns part of join must be of the same data type (CHAR, INTEGER,). But why?!?

When trying to join columns from two tables, optimizer makes sure that datatype is same or else it will
translate the column in driving table to match that of derived table.

Say for example
TABLE employee deptno (char)
TABLE dept deptno (integer)

If I am joining employee table with Dept on employee.deptno(char) = dept.deptno(Integer), optimizer will
convert character column to Integer resulting in translation . What would happen if employee table had
100 million records and every time deptno would have to undergo Translation. So we have to make sure
to avoid such scenarios since translation is a cost factor and might need time and system resources.

Make sure you are joining columns that have same data types to avoid translation!!!!


Tip 3 : Do not use functions like SUBSTR, COALESCE , CASE ... on the indices used as part of
Join. Why?!?

It is not recommended not to use functions such as SUBSTR, COALESCE, CASE and others since they
add up to cost factor resulting in performance issue.
Optimizer will not be able to read stats on those columns which have functions as it is busy
converting functions. This might result in Product join, spool out issues and optimizer will not be able to
take decisions since no stats/demographics are available on column. It might assume column to have 100
values instead of 1 million values and might redistribute on wrong assumption directly impacting
performance.


Tip 4 : use NOT NULL where ever possible!

What?!! Did someone say Not Null?? .. Yes, we have to make sure to use NOT null for columns which
are declared as NULLABLE in TABLE definition.
Reason being that all the Null values might get sorted to one poor AMP resulting in infamous " NO
SPOOL SPACE " Error as that AMP cannot accommodate any more Null values.
SO remember to use NOT NULL in joining so that table SKEW can be avoid .

Since V2R5 , teradata automatically adds the condition IS NOT NULL to the query. Still it is better to
ensure NOT NULL columns are not included as part of the join.
Teradata Performance Tuning - Basic Tips
Performance tuning thumb rules.

Here are very basic steps which are used to PT any given query in given environment . As a pre-requiste
, make sure
- user has proper select rights and actual profile settings
- Enough space available to run and test the queries

1. Run explain plan (pressing F6 or EXPLAIN sel * ,)
Then see for potential information like
- No or low confidence
- Product joins conditions
- By way of an all row scan - FTS
- Translate

Also check for
- Distinct or group by keywords in SQL query
- In/ not in keywords and check for the list of values generated for the same

APPROACHES

A. In case of product join scenarios,check for
- Proper usage of alias
- joining on matching columns
- Usage of join keywords - like specifying type of joins (ex. inner or outer )
- use union in case of "OR scenarios
- Ensure statistics are collected on join columns and this is especially important if the columns you are
joining on are not unique.

B. collects stats
- Run command "diagnostic help stats on for the session"
- Gather information on columns on which stats has to be collected
- Collect stats on suggestions columns
- Also check for stats missing on PI, SI or columns used in joins - "help stats
<databasename>.<tablename>
- Make sure stats are re-collected when at-least 10% of data changes
- remove unwanted stats or stat which hardly improves performance of the queries
- Collect stats on columns instead of indexes since index dropped will drop stats as well!!
- collect stats on index having multiple columns, this might be helpful when these columns are used in join
conditions
- Check if stats are re-created for tables whose structures have some changes

c. Full table scan scenarios
- Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the
system
- Make sure SI is defined on the columns which are used as part of joins or Alternate access path.
- Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is
defined on that particular column

2. If intermediate tables are used to store results, make sure that
- It has same PI of source and destination table

3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that
the large table is not redistributed on AMPS

4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and
use this table in the query

5. Make sure when to use exists/not exists condition since they ignore unknown comparisons (ex. -
NULL value in the column results in unknown) . Hence this leads to inconsistent results

6. Inner Vs Outer Joins
Check which join works efficiently in given scenarios.Some examples are
- Outer joins can be used in case of large table joining with small tables (like fact table joining with
Dimension table based on reference column)
- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing
Please note for outer join conditions:
1. Filter condition for inner table should be present in "ON" condition
2. Filter condition for outer table should be present in "WHERE" condition
DIAGNOSTIC HELPSTATS
One of my favorite commands and most useful among the lot , is Diagnostic help stats .
This command is very useful in helping user understand which all columns should have collect
stats be collected on, so optimizer can select the best plan.
List of useful Data dictionary views
List of useful Data dictionary views which might come in handy in situations!

1. DBC.users
This view gives current user information

2. dbc.sessioninfo
This view gives information about
- details of users currently logged in

3.DBC.Databases
This view list all the databases present in the given teradata database system. ALso contains useful
information like
-Creatorname
-OWnername
-PERMspace
-SPOOLspace
-TEMPspace

4.DBC.Indices
It gives information on the index created for given table

5.DBC.Tables
It gives information about all the Tables(T), views(V), macros(M), triggers(G), and stored procedures .

6.DBC.IndexConstraints
It Provides information about partitioned primary index constraints.
'Q' indicates a table with a PPI

7. DBC.DiskSpace
It provides information about disk space usage (including spool) for any database or account.
SELECT DatabaseName
,CAST (SUM (MaxPerm) AS FORMAT 'zzz,zzz,zz9')
,CAST (SUM (CurrentPerm) AS FORMAT 'zzz,zzz,zz9')
,CAST (((SUM (CurrentPerm))/
NULLIFZERO (SUM(MaxPerm)) * 100)
AS FORMAT 'zz9.99%') AS "% Used"
FROM DBC.DiskSpace
GROUP BY 1
ORDER BY 4 DESC ;

8. DBC.TableSize
It provides information about disk space usage (excluding spool) for any database, table or account
SELECT Vproc
,CAST (TableName
AS FORMAT 'X(20)')
,CurrentPerm
,PeakPerm
FROM DBC.TableSize
WHERE DatabaseName = USER
ORDER BY TableName, Vproc ;

9. DBC.AllSpace
It provides information about disk space usage (including spool) for any database, table, or account.
SELECT Vproc
,CAST (TableName AS
FORMAT 'X(20)')
,MaxPerm
,CurrentPerm
FROM DBC.AllSpace
WHERE DatabaseName = USER
ORDER BY TableName, Vproc ;

10. DBC.columnstats , DBC.indexstats and DBC.Multicolumnstats
These are used to find stats info on given tables
How do you find out number of AMP's in the Given system
Answer:

Select HASHAMP () +1;


List types of HASH functions used in teradata?
Answer:

There are HASHROW, HASHBUCKET, HASHAMP and HASHBAKAMP.
The SQL hash functions are:
HASHROW (column(s))
HASHBUCKET (hashrow)
HASHAMP (hashbucket)
HASHBAKAMP (hashbucket)

Example:
SELECT
HASHROW ('Teradata') AS "Hash Value"
, HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num"
, HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num"
, HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Fallback Num" ;

Difference between Create table (copy) and Create table
(select)

Difference between Create table (copy) and Create table (select)

When ever we need to create a copy of existing table we tend to use create table(copy ) from existing
table or Create table ( select) from existing table.

Many may ignore the difference in running of create table in two different ways assuming the structure
created to be same. But in actual case, it is not so!!
let us try out two type of create table types using examples to understand the differences.

Create a table Check123 which include not null ,default ,UPI and USI definations in it

SHOW TABLE check123;
/*
CREATE SET TABLE check123 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
int1 INTEGER DEFAULT 0 ,
int12 INTEGER NOT NULL DEFAULT 0 ,
int2 INTEGER NOT NULL,
int3 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX prim1 ( int3 )
UNIQUE INDEX uniq1 ( int2 );
*/

Step1: Create table Check_COPY from Check123 using CREATE TABLE (COPY ) method

CREATE TABLE check_COPY AS check123 WITH no data ;
Run show table command to check for table structure
SHOW TABLE check_COPY;
/*
CREATE SET TABLE check_COPY ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
int1 INTEGER DEFAULT 0 ,
int12 INTEGER NOT NULL DEFAULT 0 ,
int2 INTEGER NOT NULL,
int3 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX prim1 ( int3 )
UNIQUE INDEX uniq1 ( int2 );
*/

From the following observation we can understand that the table created using COPY method will retain all
datatypes and index definations like UPI and NUPI

Step2: Create table Check_SELECT from Check123 using CREATE TABLE (COPY ) method

CREATE TABLE Check_SELECT AS
( sel * FROM check123 ) WITH no data ;

Run show table command to check for table structure
SHOW TABLE Check_SELECT;
/*
CREATE SET TABLE Check_SELECT ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
int1 INTEGER, --DEFAULT MISSING
int12 INTEGER, -- DEFAULT and NOTNULL MISSING
int2 INTEGER, -- NOTNULL MISSING
int3 INTEGER) -- NOTNULL MISSING
PRIMARY INDEX ( int1 );
*/

Hence when table is created using CREATE TABLE using SELECT from table method, the table created will
not retain following from original table
DEFAULT
NOT NULL
UNIQUE PRIMARY INDEX
UNIQUE INDEX


Types of Teradata Joins
Teradata joins

When we join two or more tables on a column or set of columns, Joining takes place. This will result in
data resulting from matching records in both the tables. This Universal concept remains the same for all
the databases.
In Teradata, we have Optimizer (a very smart Interpreter), which determines type of join strategy to be
used based on user input taking performance factor in mind.

In Teradata, some of common join types are used like
- Inner join (can also be "self join" in some cases)
- Outer Join (Left, Right, Full)
- Cross join (Cartesian product join)

When User provides join query, optimizer will come up with join plans to perform joins. These Join
strategies include
- Merge Join
- Nested Join
- Hash Join
- Product join
- Exclusion Join


Merge Join
--------------------

Merge join is a concept in which rows to be joined must be present in same AMP. If the rows to be joined
are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make
that happen based on row hash of the columns involved in the joins WHERE Clause.
If two tables to be joined have same primary Index, then the records will be present in Same AMP
and Re-Distribution of records is not required.

There are four scenarios in which redistribution can happen for Merge Join
Case 1: If joining columns are on UPI = UPI, the records to be joined are present in Same AMP and
redistribution is not required. This is most efficient and fastest join strategy
Case 2: If joining columns are on UPI = Non Index column, the records in 2nd table has to be
redistributed on AMP's based on data corresponding to first table.
Case 3: If joining columns are on Non Index column = Non Index column , the both the tables are to be
redistributed so that matching data lies on same amp , so the join can happen on redistributed data. This
strategy is time consuming since complete redistribution of both the tables takes across all the amps
Case 4: For join happening on Primary Index, If the Referenced table (second table in the join) is very
small, then this table is duplicated /copied on to every AMP.

Nested Join
-------------------
Nested Join is one of the most precise join plans suggested by Optimizer .Nested Join works on UPI/USI
used in Join statement and is used to retrieve the single row from first table . It then checks for one more
matching rows in second table based on being used in the join using an index (primary or secondary) and
returns the matching results.

Example:
Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP ,
DEPARTMENT DEP
Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345; -- this results in nested join

Hash join
----------------
Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash Join
to be close relative of Merge based on its functionality. In case of merge join, joining would happen in
same amp. In Hash Join, one or both tables which are on same amp are fit completely inside the AMP's
Memory . Amp chooses to hold small tables in its memory for joins happening on ROW hash.

Advantages of Hash joins are
1. They are faster than Merge joins since the large table doesnt need to be sorted.
2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.

Exclusion Join
-------------------------

These type of joins are suggested by optimizer when following are used in the queries
- NOT IN
- EXCEPT
- MINUS
- SET subtraction operations

Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP
WHERE EMP.Enum NOT IN
( Select Enum from
DEPARTMENT DEP
where Enum is NOT NULL );

Please make sure to add an additional WHERE filter with <column> IS NOT NULL since usage of
NULL in a NOT IN <column> list will return no results.

Exclusion join for following NOT In query has 3 scenarios
Case 1: matched data in "NOT IN" sub Query will disqualify that row
Case 2: Non-matched data in "NOT IN" sub Query will qualify that row
Case 3: Any Unknown result in "NOT IN" will disqualify that row - ('NULL' is a typical example of this
scenario).
distinct vs group by
Since DISTINCT redistributes the rows immediately, more data may move between the AMPs, where as
GROUP BY that only sends unique values between the AMPs.
So, we can say that GROUP BY sounds more efficient.
But when you assume that data is nearly unique in a table, GROUP BY will spend more time
attempting to eliminate duplicates that do not exist at all.Therefore, it is wasting its time to check for
duplicates the first time. Then, it must redistribute the same amount of data.
Hence it is better to go for
GROUP BY - when Many duplicates
DISTINCT - when few or no duplicates
GROUP BY - SPOOL space is exceeded

Working around with Transposition of Table data
I was working around with some transposition and want to share one of the samples .

Consider the customer table ,having customer and month details
customer...... month
Ron................ 1
Kev................. 2
joh................. 1
Nel................. 2
Ave................. 11
Cin................. 10
tra................. 3

Case statement play very important role in transposition of rows to columns and viceversa. In the
following scenarios , we can find the extensive usage of case statement

Scenario 1:
Display total number of customers for each month

jan....feb....mar....apr....may....jun....jul....aug....sep....oct....nov....dec
2......2......1......0......0......0......0......0......0......1......1......0....

The sql query is as follows:

sel
count(case when month = '1' then customer else null end) "jan",
count(case when month = '2' then customer else null end) "feb",
count(case when month = '3' then customer else null end) "mar",
count(case when month = '4' then customer else null end) "apr",
count(case when month = '5' then customer else null end) "may",
count(case when month = '6' then customer else null end) "jun",
count(case when month = '7' then customer else null end) "jul",
count(case when month = '8' then customer else null end) "aug",
count(case when month = '9' then customer else null end) "sep",
count(case when month = '10' then customer else null end) "oct",
count(case when month = '11' then customer else null end) "nov",
count(case when month = '12' then customer else null end) "dec"
from CUST_TABLE ;


Scenario 2:
Display customer and month details with every customer mapped to corresponding month

customer....jan....feb....mar....apr....may....jun....jul....aug....sep....oct....nov....dec
Ron...........1......0......0......0......0......0......0......0......0......0......0......0....
Kev...........0......1......0......0......0......0......0......0......0......0......0......0....
joh...........1......0......0......0......0......0......0......0......0......0......0......0....
Nel...........0......1......0......0......0......0......0......0......0......0......0......0....
Ave...........0......0......0......0......0......0......0......0......0......0......1......0....
Cin...........0......0......0......0......0......0......0......0......0......1......0......0....
Tra...........0......0......1......0......0......0......0......0......0......0......0......0....

The sql query is as follows:

sel
customer,
count(case when month = '1' then customer else null end) "jan",
count(case when month = '2' then customer else null end) "feb",
count(case when month = '3' then customer else null end) "mar",
count(case when month = '4' then customer else null end) "apr",
count(case when month = '5' then customer else null end) "may",
count(case when month = '6' then customer else null end) "jun",
count(case when month = '7' then customer else null end) "jul",
count(case when month = '8' then customer else null end) "aug",
count(case when month = '9' then customer else null end) "sep",
count(case when month = '10' then customer else null end) "oct",
count(case when month = '11' then customer else null end) "nov",
count(case when month = '12' then customer else null end) "dec"
from CUST_TABLE;
How to split source column into multiple target columns (
full name to first and Last)

Problem: To split fullname into firstname and lastname to be inserted into Target table.


Approach:

CREATE SET TABLE test
(
fullname varchar(30)
);


INSERT INTO test12 ('nitin raj');
INSERT INTO test12 ('nitin agarwal');
INSERT INTO test12 ('abhishek gupta');


sel * FROM test;
fullname
nitin agarwal
nitin raj
abhishek
gupta


Use index to find the position of space "SPACE" in full name and then use the position
to get
--> firstname =fullname from 1st till (SPACE-1)
-->lastname = fullname from (SPACE+1)

SELECT INDEX(fullname ,' ') AS "a", SUBSTR(fullname,1, a-1 ) ,
SUBSTR(fullname,a+1 ) FROM test;
a
Substr(fullname,1,(a-
1))
Substr(fullname,a)
6 nitin agarwal
6 nitin raj
9 abhishek gupta

Unix Commands
awk is meant for processing column-oriented text data, such as tables, presented to it on standard input
Why is AWK so important? It is an excellent filter and report writer. Many UNIX utilities
generates rows and columns of information. AWK is an excellent tool for processing these rows
and columns, and is easier to use AWK than most conventional programming languages. It can
be considered to be a pseudo-C interpretor, as it understands the same arithmatic operators as C.
AWK also has string manipulation functions, so it can search for particular strings and modify
the output. AWK also has associative arrays, which are incredible useful, and is a feature most
computing languages lack. Associative arrays can make a complex problem a trivial exercise.
1. :w Write the current file.
2. :w new.file Write the file to the name 'new.file'.
3. :w! existing.file Overwrite an existing file with the file currently being edited.
4. :wq Write the file and quit.
5. :q Quit.
6. :q! Quit with no changes.

sed is one of the very early Unix commands built for command line processing of data files.sed (stream
editor) is a Unix utility that parses text and implements a programming language which can apply
transformations to such text. It reads input line by line (sequentially), applying the operation which has
been specified via the command line (or a sed script), and then outputs the line
Cursor motion
On well-configured systems, you will find that the keyboard arrow keys will
function correctly in
emacs, moving you forward or backward one character at a time, and up or down
one line at a time.
If the arrow keys do not work, here's how to accomplish the same functions:
l Control-F moves the cursor forward to the next character.
l Control-B moves the cursor back to the previous character.
l Control-N moves the cursor to the next line.
l Control-P moves the cursor to the previous line.
In addition to basic cursor motion, emacs provides some other handy cursor motion
functions:
l Control-A moves the cursor to the start of the current line.
l Control-E moves the cursor to the end of the current line.
l ESCAPE-F moves the cursor forward to the next word.
l ESCAPE-B moves the cursor back to the previous word.
l ESCAPE-< moves the cursor to the start of the buffer.
l ESCAPE-> moves the cursor to the end of the buffer.
Inserting and deleting text
To insert text into a buffer, place the cursor where you want to start inserting text,
and start typing
away.
If you want to insert the contents of another file into the current buffer, place the
cursor at the desired
insertion point, and type Control-X-I. Emacs will ask you for the name of the file
you wish to insert.
You may also insert text by cutting it from one place, and pasting it at the insertion
point. See the
next section for information on cutting and pasting.
Deleting text is easy. As you'd expect, the delete key deletes backward one
character. Here are some
other ways to delete text:
l Control-D deletes forward one letter.
l Control-K deletes from the point to the end of the line.
l ESCAPE-D deletes forward one word.
l ESCAPE-delete deletes backward one word.
Cutting and pasting text regions
Emacs allows you to select a region of text, and perform cut and paste operations
on the region. It
uses a temporary storage area called the "kill buffer" to allow you to store and
retrieve blocks of text.
There is only one kill buffer in emacs, which means that you can cut text from one
document, and
paste it into another.
To define a region of text, place the cursor at one end of the region and press
Control-spacebar. That
sets the mark. Then, move the cursor to the other end of the region. The text
between the mark and
the cursor defines the region.
To cut a region of text, and place it in the kill buffer, use the command Control-W
(think of Wipe).
The paste command is Control-Y. It Yanks the block of text from the kill buffer,
and places it where
the cursor rests. The Control-Y command only retrieves the most recently-cut
block of text.
You can paste in earlier cuts by pressing ESCAPE-Y. The ESCAPE-Y command,
used repeatedly,
will take you back through several previous text blocks that were cut. The
ESCAPE-Y command
does not work unless you type Control-Y first.
You may copy a region of text into the kill buffer without cutting it. Define the text
block by setting
the mark at one end, and moving the cursor to the other end. Then type ESCAPE-
W.
Undoing changes
It is possible to undo the changes you have made to a file by entering the command
Control-_.
(That's Control-underscore. On some keyboards, you'll have to hold down both the
control and shift
keys to enter the underscore character.)
Many word processing programs can only undo the most recent command, but
emacs remembers a
long history of commands, allowing you to undo many changes by repeatedly
entering the Control-_
code.
Customizing Emacs
The emacs editor is customizable in several ways. You can set up your own key
bindings, create
your own macros, and even create your own custom functions. Also, some aspects
of the behavior of
emacs is controlled by variables that you can set.
You can learn more about emacs functions by invoking the online help facility (by
typing ESC-X
help) and then typing the "f" key to list functions. Pressing the space bar for
completion will cause
emacs to list all the built-in functions. A list of variables can be similarly obtained
by invoking the
online help, then typing "v" then the spacebar.
If you place variable settings, key bindings, and function declarations, in a text file
called ".emacs"
in your home directory, The emacs editor will load those definitions at startup
time. Here is an emacs
configuration file with some basic variable definitions and key bindings for you to
peruse.

You might also like