RDBMS Concepts
This chapter briefly explains some concepts that you need to understand in order to use SQLBase software:
• A DBA's responsibilities.
This section discusses the responsibilities of a Database Administrator.
• Databases and networks
This section describes client/server terminology as it applies to SQLBase.
As a DBA, you ensure the
smooth operation of one or more databases and you are responsible for
the design, planning, installation, configuration, security, management,
maintenance, and operation of a database management system (DBMS) and
its supporting network.
A good way to ensure that the
databases meet the needs of your organization is by becoming familiar
with the organization’s applications: who are its users, what data is
stored and accessed, and what types of transactions occur.
• Keeping the database servers running on a daily basis.
• Diagnosing and resolving system problems.
• Installing both database server and client software.
• Creating databases.
• Backing up and recovering databases.
• Controlling security and access to the database and its objects.
• Monitoring and tuning the performance of a database as well as the applications that access it.
• Managing communications.
• Managing disk space and estimating storage needs.
• Advising developers about
table, index, and view design, multi-user considerations, networking,
and loading, converting, and unloading data.
• Ensuring data availability, accuracy, completeness, integrity, and consistency.
• Auditing the use of a database.
• Administering the system catalog.
• Helping database application users.
There are two SQLBase authority levels that a DBA can have:
The most powerful
authority level. There should only be one user with SYSADM authority.
This user can designate one or more other users as DBAs.
A user with this
authority level has privileges on all database objects and can grant,
change, or revoke the object privileges of any user but SYSADM.
Access to the SYSADM and DBA accounts should be tightly controlled.
On an organizational level,
your company needs to determine the number of people required to
maintain its databases and applications. Depending on the size of the
databases and the applications, you may have one or more DBAs. They, in
turn, may specialize in maintaining databases or applications, or they may support both a database and the applications that access it.
If you work in a large organization, you are likely to have one user with SYSADM authority who designates several users with DBA authority level and privileges. The DBAs would then share the responsibility for the database maintenance tasks.
If you work in a small organization, the user with SYSADM authority may be the DBA as well.
Read Security and Authorization for more information on authority levels.
Users need data that is easy
to access and that they can use in both ad- hoc queries and daily
transaction processing. The data structures should be flexible.
A relational
database reduces data to its most basic level in simple,
two-dimensional tables that contain columns and rows of data values. Tables are easy to visualize. Users can explain their needs in easy-to-understand terms. Data modeling is flexible.
A collection of tables can represent
complex data relationships. The relationship between two tables is
defined by a column that both tables contain. This makes the links between the tables easy to see and understand.
For example, in an order entry
system, the customer information is stored in a separate table than the
order information. The two tables are related by the existence of a
common column: customer number (Cust_No).
Customers and Orders tables in an order entry system
This link enables SQLBase to
match a row in the Orders table to its associated row in the Customers
table, allowing you to find the customer name and address for each
order.
The terms used to refer to
database objects vary from system to system. The following table shows
how the terms relate to each other. Gupta uses the "Relational database" terms.
Relational
database |
Set
theory |
Conventional
data processing
|
The word relation comes from the study of data structures done by E.F. Codd and others at IBM. Codd used set theory. A set
is a group of objects with a defining rule that enables you to tell
whether a given object is in the set or not. For example, the set of
white rabbits is a subset of all rabbits with the defining property that
the animal has white fur.
A relational system produces a result set by retrieving a set of rows from one or more tables. You can, in turn, operate on this result
set instead of operating against the tables from which you originally
retrieved the rows. This set-at-a-time approach means that a single
relational command can retrieve, update, or delete multiple rows stored
in the database.
The sequence
(left-to-right or up-to-down) of data in a table holds no information
about the data itself. Part of the flexibility of a relational database
comes from this property; no information is "hidden" in the physical
layout of the database.
In a relational system, the
meaning of a column or row does not depend on its order relative to the
other columns and rows. This means that you can retrieve data from a
select number of columns without having to retrieve data from all
columns. The same goes for rows: you can retrieve data from a subset of
rows without having to retrieve all the rows.
• You can rearrange data and
combine it easily into new relationships; it is not locked into certain
relationships because of the way it is stored.
• You can more easily update data
because there are fewer instances of the data. This reduces the
likelihood of errors arising from a failure to update all instances.
• It reduces disk space requirements.
Codd's relational algebra defined three key data retrieval operations.
Selection is the process of retrieving all rows of a table which meet some specified
criterion. Note that this is a very narrow definition of selection, and
should not be confused with the SQL SELECT command, which does more
than just selection.
Projection is the process of retrieving one or more columns from a table in a specified order and removing any duplicate rows.
Joining is the process of retrieving columns from different tables where the values of a common column in each table are equal.
The ability to join two or
more tables is powerful. The join operation distinguishes relational
systems from non-relational systems.
Hierarchical and network
database models are procedural and record-at-a-time. To find a record,
you have to navigate or find a path to the record you want and give
multiple procedural commands that tell the system to walk down that
path, step-by-step. You need a detailed understanding of how the data is
stored. Once you have created a database and loaded data into it, it
can be difficult to change.
In contrast, a relational system provides automatic navigation. You do not have to know how the database stores data in order to retrieve, change, or destroy that data. This makes it easy to access data.
Hierarchical and network
database systems store some data as values and other data as pointers.
For example, in a network system, the data that order #123 comes to a
total of $58.00 would be stored as a value in a field. The data that
order #123 belongs to customer #345 would be stored as a pointer from
the order record to the corresponding customer record.
You must use pointers to
associate records to one another in a network or hierarchical database.
You must decide whether to store data as a pointer or as a value when
you define the database. Network and hierarchical systems use
relationships that are pointer-based and predefined or static.
In contrast, relational
database systems can use any value to link one table to another, and you
define relationships between values when you query the data, not when
you create the tables. This gives you maximum flexibility to create
spontaneous queries.
A database
is a set of SQL objects. When you create a database, you are actually
naming an eventual collection of tables, associated indexes, and views.
A single database can contain
all the data associated with one application or with a group of related
applications. Collecting data into a single database lets you enable or
prevent access to all the data in just one operation.
A database contains one or more tables. Each table has a name and contains a specific number of columns and rows. Each column in a row is related in some way to the other columns in the same row.
Each column has a name and a data type. Data values exist at the intersection of a row and a column.
In theory, no row should be a duplicate
of any other row in the same table. Consider a sales order table with
columns for the date, product code, quantity, and customer ID. If a
customer orders ten widgets in the morning and then another ten in the
afternoon, this would create two duplicate rows in the table, one for
each order.
To ensure the creation of
unique rows, you might create an additional column to store the time at
which the order was placed, or to store a uniquely-generated sequence
number (such as an invoice number).
A table can have a primary key which is a column or a group of columns whose value uniquely identifies each row. Another table can have a foreign
key which is a column or a group of columns whose values are equal to
those of the first table’s primary key. One of the rules of referential integrity is that the value of a table’s foreign key must match the value of another table’s primary key.
SQLBase has SQL commands that
enable you to add new columns to existing tables and make existing
columns wider. These change take effect immediately and no database
reorganization is necessary.
An index
is an ordered set of pointers to the data in a table. It is based on
the data values in one or more columns of the table. SQLBase stores
indexes separately from tables.
SQLBase decides whether or not
to use an index when accessing a table, and users need not be aware
that SQLBase is using an index.
• It improves performance because it makes data access faster.
• It ensures uniqueness. A table
with a unique index cannot have two rows with the same values in the
column or columns that form the index key.
A view
is an alternate way of representing data that exists in one or more
tables. A view can include all or some of the columns from one or more base tables. You can also base a view on other views or on a combination of views and tables.
A view
looks like a table and you can use it as though it were a table. You
can use a view name in a SQL command as though it were a table name. You
cannot do some operations through a view, but you do not need to know
that an apparent table is actually a view.
A table has a storage
representation, but a view does not. When you store a view, SQLBase
stores the definition of the view in the system catalog, but SQLBase
does not store any data for the view itself because the data already
exists in the base table or tables.
A view lets different users
see the same data in different ways. This allows programmers, database
administrators, and end users to see the data as it suits their needs.
For example, employees might have access to columns containing general
information about office locations and phone extensions, while
administrators might have access to additional columns containing more
confidential information such as employees’ home addresses and phone
numbers.
A synonym
is another name for a table, view, or external function. When you
access a table, view, or external function created by another user (once
you have been granted the privilege), you must fully-qualify the table
name by prefixing it with the owner's name, unless a synonym for the
table or view is available. If one is available, you can refer to the
user’s table or view without having to fully qualify the name.
A stored command is
a compiled query, data manipulation command, or procedure that is
stored for later execution. SQLBase stores the command’s or procedure’s
execution plan as well, so subsequent execution is very fast.
A SQLBase procedure
is a set of Scalable Application Language (SAL) and SQL statements that
is assigned a name, compiled, and optionally stored in a SQLBase
database. Procedures reduce network traffic and simplify your
applications since they are stored and processed on the server. They
also provide more flexible security, allowing end users access to data
which they otherwise have no privilege to access.
SQLBase procedures can be static or dynamic. Static procedures must be stored (at which time they are parsed and precompiled) before they are executed. Dynamic procedures contain dynamic embedded SQL statements, which are parsed and compiled at execution time.
SQLBase also provides preconstructed procedures as useful tools to help you maintain your database. See Appendix B for a description of SQLBase-supplied procedures.
An external function is
a user-defined function that resides in an "external" DLL (Dynamic Link
Library) that is invoked within a SQLBase stored procedure. SQLBase
accepts external functions in the language of your choice, such as C and
C++. The SQLBase server converts data types of parameters that are
declared in stored procedures into their external representation.
Using external functions
enhances the power of the SQLBase server, allowing you to achieve
maximum flexibility and performance with minimal programming effort. It
extends the functionality of stored procedures with no impact on the
application or the server. When external functions are called, they are
dynamically plugged in and behave like built-in functions. For details,
read Chapter 8, External Functions in the SQLBase SQL Language Reference.
A trigger
activates a stored or inline procedure that SQLBase automatically
executes when a user attempts to change the data in a table. You create
one or more triggers on a table, with each trigger defined to activate
on a specific command (an INSERT, UPDATE, or DELETE). You can also
define triggers on stored procedures.
Triggers allow actions to
occur based on the value of a row before or after modification. Triggers
can prevent users from making incorrect or inconsistent data changes
that can jeopardize database integrity. They can also be used to
implement referential integrity constraints. For details on referential
integrity, read Chapter 6, Referential Integrity in the SQLBase SQL Language Reference.
For details on the trigger execution order before a single data manipulation statement is executed, read the Section DML Execution Model in Chapter 1 of the SQLBase SQL Language Reference.
SQLBase maintains a system catalog, or data dictionary,
in every database. The system catalog contains tables that store
information about the database’s tables, views, columns, indexes,
execution plans, and security privileges.
When you create, change, or
drop a database object, SQLBase modifies rows in the system catalog
tables that describe the object and how it is related to other objects.
The system catalog also contains the name, size, type, and valid values of each table’s columns.
You can query the system catalog tables just like any other table.
Read System Catalog Tables for detailed information about the data dictionary tables.
To guarantee the integrity of data, SQLBase uses page
locks to prevent users from changing data while another user is reading
or changing data on the same page. Locks prevent lost updates from
occurring and ensure that data does not change while you are reading it
or changing it yourself.
SQLBase stores data in 1K pages, and can lock both base table pages and index pages. For details, read the section Database locking in Chapter 4, About Databases.
This section describes client/server terminology as it applies to SQLBase.
The term distributed processing
describes the placement of computers where they are needed, whether on
the floor of a manufacturing facility, in an accounting department, in a
laboratory, at a field location, or in an executive office. A network links computers so that they can exchange information, share resources, and perform distributed processing.
A distributed
processing network allows information to originate anywhere in the
network. You can place systems where they are needed while still having
access to the facilities of other widely dispersed systems. Information
can be exchanged among all parts of an organization.
The term distributed
processing had been used for several years to label network
configurations where remote computers exchanged data. Two developments
in the 1980s brought a new meaning to the term: PCs (Personal Computers) and LANs (Local Area Networks).
PCs supplied solutions to many
simple information processing requirements. The user interface on PCs
became sophisticated with bit-mapped displays that used windowing and
graphics, and a mouse. This made applications easy to use.
With a LAN, PCs could
communicate with each other and share resources such as databases. Users
began to wonder if they could move some transaction processing systems,
which had traditionally run on minicomputers and mainframes, onto PCs.
They also began to think about new types of applications that had not
been possible before.
Client-server technology is a
form of distributed processing where computing activities are shared
among cooperating, networked computers. An application is functionally
split into two or more programs which execute on different computers and
communicate with each other by passing messages across the network.
The client programs run on users' PCs. Server
programs run on more powerful computers. The client sends a request to a
server and when the server receives the request, it processes it and
sends the results back to the client.
Typically, a local-area
network contains more clients than servers. Client computers cannot
share their resources or use the resources of other client computers.
Through client-server
architecture, users gain access to capabilities available on a server
computer which are not available locally on their desktop computer. A
client-server application has the capabilities of both the client and
the server.
Servers
are usually the most powerful computers on the local area network.
Certain criteria make some systems better suited to be servers, such as a
high input/output transfer rate and network throughput (the speed at
which a server processes network tasks). High disk capacity and multiple
disk controllers are another criteria. A server is usually
multi-tasking so that it can serve multiple clients simultaneously.
When the processing duties are shared by a client and server in a sophisticated manner, it is called cooperative processing.
An example of cooperative processing is a distributed database system. A
client takes user input and submits processing requests to the server.
Database processing, such as retrieving and sorting data, is centralized
at a database server. Application processing is distributed between the
clients and the server.
The client and server
cooperate to do a job. The client manages the screen display and
graphics, carries on an interactive dialogue with the user, validates
data, does some local computation, and provides a network interface that
talks to the server. The database server is responsible for data
storage, security and integrity, and input/output.
In a cooperative processing
environment, components of an application run where they are best
suited. Data-intensive applications suffer performance problems when all
computing happens on the client. The idea is to off-load time- and
data-intensive tasks to the database server while doing as much
processing on the client as possible. It is often the case that requests
are processed faster and more economically on a shared high-performance
database server than on a stand-alone client computer.
By their nature, some applications are suited to cooperative processing. For example, a network-intrinsic
application is one that does not make sense on a stand-alone system.
Communications is an integral part of the overall process of completing
the application. Distributed database is a good example of a
network-intrinsic application because it uses the network to provide shared access to data for many users.
Database software has evolved.
At first, database software was monolithic and ran on one large
computer accessed by dumb terminals. Later, database software become
more sophisticated and split the database function into two components:
the client (frontend) and the server (backend). Note that both components still ran on the same mainframe or minicomputer at this time.
The start of database software evolution
Eventually, database software
vendors moved the client component down to microcomputers and LANs, and
the database server soon followed.
A database server and clients on a LAN
The difference from file servers
Database servers are different from file
servers. File servers are responsible for storing, not processing, data
files. Application processing is the client’s responsibility and a
client’s request for data results in the file server sending an entire
database across the network to the client. As the number of clients
making requests for data increases, so does the likelihood of network
performance degradation and bottlenecks.
In contrast, a relational
database system has the flexibility and functionality necessary to work
across networked computer systems. A client application uses SQL to talk
to a database server. The client is responsible for executing the
application and formatting a compact SQL command which it then sends
across the network to the database server. The database server is
responsible for parsing and executing the SQL command and sending data
and a status code back to the client.
Unlike a file server, a
database server sends only a part (subset) of the database back to the
requesting PC. Most database servers use SQL because it is a convenient language for specifying logical subsets of data.
Cooperative processing: a client and server communicating via SQL
The benefits of distributed databases are:
• Location independence. The location of a client application is independent of the location of the data.
• Location transparency. Users
can access a database without having to know its location, and you can
move a database with no affect on users or applications.
• Incremental application growth.
You can upgrade an application by adding more clients or by purchasing a
more powerful database server machine.
• Site autonomy. You can maintain
each database separately from other databases and you can administer
the shared data and allow programmers to support the applications.
• You can distribute and secure data to match individual departmental needs.
• Hardware and software
independence. You can use hardware and software from different
manufacturers, and you can match them to the requirements of your
applications.
• Distributed processing. You can distribute the processing and storage of data among many computers.
• Increased availability and
reliability. If a database server goes down, only its clients are
affected, not your entire organization.
No comments:
Post a Comment