RDMS concepts ~ CODE TO LIVE

Saturday, 4 August 2012

RDMS concepts

RDBMS Concepts

This chapter briefly explains some concepts that you need to understand in order to use SQLBase software:

• A DBA's responsibilities.

This section discusses the responsibilities of a Database Administrator.

• Databases and networks

This section describes client/server terminology as it applies to SQLBase.

Database administrator (DBA)

As a DBA, you ensure the smooth operation of one or more databases and you are responsible for the design, planning, installation, configuration, security, management, maintenance, and operation of a database management system (DBMS) and its supporting network.

A good way to ensure that the databases meet the needs of your organization is by becoming familiar with the organization’s applications: who are its users, what data is stored and accessed, and what types of transactions occur.

A DBA's specific responsibilities include:

• Keeping the database servers running on a daily basis.

• Diagnosing and resolving system problems.

• Installing both database server and client software.

• Creating databases.

• Backing up and recovering databases.

• Controlling security and access to the database and its objects.

• Monitoring and tuning the performance of a database as well as the applications that access it.

• Managing communications.

• Managing disk space and estimating storage needs.

• Advising developers about table, index, and view design, multi-user considerations, networking, and loading, converting, and unloading data.

• Ensuring data availability, accuracy, completeness, integrity, and consistency.

• Auditing the use of a database.

• Administering the system catalog.

• Helping database application users.

Organizing the DBA function

There are two SQLBase authority levels that a DBA can have:

• SYSADM

The most powerful authority level. There should only be one user with SYSADM authority. This user can designate one or more other users as DBAs.

• DBA

A user with this authority level has privileges on all database objects and can grant, change, or revoke the object privileges of any user but SYSADM.

Access to the SYSADM and DBA accounts should be tightly controlled.

On an organizational level, your company needs to determine the number of people required to maintain its databases and applications. Depending on the size of the databases and the applications, you may have one or more DBAs. They, in turn, may specialize in maintaining databases or applications, or they may support both a database and the applications that access it.

If you work in a large organization, you are likely to have one user with SYSADM authority who designates several users with DBA authority level and privileges. The DBAs would then share the responsibility for the database maintenance tasks.

If you work in a small organization, the user with SYSADM authority may be the DBA as well.

Read Security and Authorization for more information on authority levels.

Relational databases

Users need data that is easy to access and that they can use in both ad- hoc queries and daily transaction processing. The data structures should be flexible.

A relational database reduces data to its most basic level in simple, two-dimensional tables that contain columns and rows of data values. Tables are easy to visualize. Users can explain their needs in easy-to-understand terms. Data modeling is flexible.

Representing relationships

A collection of tables can represent complex data relationships. The relationship between two tables is defined by a column that both tables contain. This makes the links between the tables easy to see and understand.

For example, in an order entry system, the customer information is stored in a separate table than the order information. The two tables are related by the existence of a common column: customer number (Cust_No).

Customers and Orders tables in an order entry system

This link enables SQLBase to match a row in the Orders table to its associated row in the Customers table, allowing you to find the customer name and address for each order.

Terminology

The terms used to refer to database objects vary from system to system. The following table shows how the terms relate to each other. Gupta uses the "Relational database" terms.

Relational database	Set theory	Conventional data processing
Table	Relation	File or data set
Row	Tuple	Record
Column	Attribute Domain	Field

The word relation comes from the study of data structures done by E.F. Codd and others at IBM. Codd used set theory. A set is a group of objects with a defining rule that enables you to tell whether a given object is in the set or not. For example, the set of white rabbits is a subset of all rabbits with the defining property that the animal has white fur.

A relational system produces a result set by retrieving a set of rows from one or more tables. You can, in turn, operate on this result set instead of operating against the tables from which you originally retrieved the rows. This set-at-a-time approach means that a single relational command can retrieve, update, or delete multiple rows stored in the database.

The sequence (left-to-right or up-to-down) of data in a table holds no information about the data itself. Part of the flexibility of a relational database comes from this property; no information is "hidden" in the physical layout of the database.

In a relational system, the meaning of a column or row does not depend on its order relative to the other columns and rows. This means that you can retrieve data from a select number of columns without having to retrieve data from all columns. The same goes for rows: you can retrieve data from a subset of rows without having to retrieve all the rows.

Relational databases reduce the storage of redundant data. This has three benefits:

• You can rearrange data and combine it easily into new relationships; it is not locked into certain relationships because of the way it is stored.

• You can more easily update data because there are fewer instances of the data. This reduces the likelihood of errors arising from a failure to update all instances.

• It reduces disk space requirements.

Relational operations

Codd's relational algebra defined three key data retrieval operations.

Selection

Selection is the process of retrieving all rows of a table which meet some specified criterion. Note that this is a very narrow definition of selection, and should not be confused with the SQL SELECT command, which does more than just selection.

Projection

Projection is the process of retrieving one or more columns from a table in a specified order and removing any duplicate rows.

Join

Joining is the process of retrieving columns from different tables where the values of a common column in each table are equal.

The ability to join two or more tables is powerful. The join operation distinguishes relational systems from non-relational systems.

The SQL SELECT command performs selection, projection, and join operations.

Comparison to other models

Hierarchical and network database models are procedural and record-at-a-time. To find a record, you have to navigate or find a path to the record you want and give multiple procedural commands that tell the system to walk down that path, step-by-step. You need a detailed understanding of how the data is stored. Once you have created a database and loaded data into it, it can be difficult to change.

In contrast, a relational system provides automatic navigation. You do not have to know how the database stores data in order to retrieve, change, or destroy that data. This makes it easy to access data.

Hierarchical and network database systems store some data as values and other data as pointers. For example, in a network system, the data that order #123 comes to a total of $58.00 would be stored as a value in a field. The data that order #123 belongs to customer #345 would be stored as a pointer from the order record to the corresponding customer record.

You must use pointers to associate records to one another in a network or hierarchical database. You must decide whether to store data as a pointer or as a value when you define the database. Network and hierarchical systems use relationships that are pointer-based and predefined or static.

In contrast, relational database systems can use any value to link one table to another, and you define relationships between values when you query the data, not when you create the tables. This gives you maximum flexibility to create spontaneous queries.

How SQL organizes data

SQL objects include:

• Databases

• Tables

• Columns

• Indexes

• Views

• Synonyms

• Stored commands

• Stored procedures

• External functions

• Triggers

Databases, tables, and columns

A database is a set of SQL objects. When you create a database, you are actually naming an eventual collection of tables, associated indexes, and views.

A single database can contain all the data associated with one application or with a group of related applications. Collecting data into a single database lets you enable or prevent access to all the data in just one operation.

A database contains one or more tables. Each table has a name and contains a specific number of columns and rows. Each column in a row is related in some way to the other columns in the same row.

Each column has a name and a data type. Data values exist at the intersection of a row and a column.

Data uniqueness

In theory, no row should be a duplicate of any other row in the same table. Consider a sales order table with columns for the date, product code, quantity, and customer ID. If a customer orders ten widgets in the morning and then another ten in the afternoon, this would create two duplicate rows in the table, one for each order.

To ensure the creation of unique rows, you might create an additional column to store the time at which the order was placed, or to store a uniquely-generated sequence number (such as an invoice number).

A table can have a primary key which is a column or a group of columns whose value uniquely identifies each row. Another table can have a foreign key which is a column or a group of columns whose values are equal to those of the first table’s primary key. One of the rules of referential integrity is that the value of a table’s foreign key must match the value of another table’s primary key.

Changing the database structure

SQLBase has SQL commands that enable you to add new columns to existing tables and make existing columns wider. These change take effect immediately and no database reorganization is necessary.

Indexes

An index is an ordered set of pointers to the data in a table. It is based on the data values in one or more columns of the table. SQLBase stores indexes separately from tables.

SQLBase decides whether or not to use an index when accessing a table, and users need not be aware that SQLBase is using an index.

An index provides two benefits:

• It improves performance because it makes data access faster.

• It ensures uniqueness. A table with a unique index cannot have two rows with the same values in the column or columns that form the index key.

Views

A view is an alternate way of representing data that exists in one or more tables. A view can include all or some of the columns from one or more base tables. You can also base a view on other views or on a combination of views and tables.

A view looks like a table and you can use it as though it were a table. You can use a view name in a SQL command as though it were a table name. You cannot do some operations through a view, but you do not need to know that an apparent table is actually a view.

A table has a storage representation, but a view does not. When you store a view, SQLBase stores the definition of the view in the system catalog, but SQLBase does not store any data for the view itself because the data already exists in the base table or tables.

A view lets different users see the same data in different ways. This allows programmers, database administrators, and end users to see the data as it suits their needs. For example, employees might have access to columns containing general information about office locations and phone extensions, while administrators might have access to additional columns containing more confidential information such as employees’ home addresses and phone numbers.

Synonyms

A synonym is another name for a table, view, or external function. When you access a table, view, or external function created by another user (once you have been granted the privilege), you must fully-qualify the table name by prefixing it with the owner's name, unless a synonym for the table or view is available. If one is available, you can refer to the user’s table or view without having to fully qualify the name.

Stored commands and procedures

A stored command is a compiled query, data manipulation command, or procedure that is stored for later execution. SQLBase stores the command’s or procedure’s execution plan as well, so subsequent execution is very fast.

A SQLBase procedure is a set of Scalable Application Language (SAL) and SQL statements that is assigned a name, compiled, and optionally stored in a SQLBase database. Procedures reduce network traffic and simplify your applications since they are stored and processed on the server. They also provide more flexible security, allowing end users access to data which they otherwise have no privilege to access.

SQLBase procedures can be static or dynamic. Static procedures must be stored (at which time they are parsed and precompiled) before they are executed. Dynamic procedures contain dynamic embedded SQL statements, which are parsed and compiled at execution time.

SQLBase also provides preconstructed procedures as useful tools to help you maintain your database. See Appendix B for a description of SQLBase-supplied procedures.

External functions

An external function is a user-defined function that resides in an "external" DLL (Dynamic Link Library) that is invoked within a SQLBase stored procedure. SQLBase accepts external functions in the language of your choice, such as C and C++. The SQLBase server converts data types of parameters that are declared in stored procedures into their external representation.

Using external functions enhances the power of the SQLBase server, allowing you to achieve maximum flexibility and performance with minimal programming effort. It extends the functionality of stored procedures with no impact on the application or the server. When external functions are called, they are dynamically plugged in and behave like built-in functions. For details, read Chapter 8, External Functions in the SQLBase SQL Language Reference.

Triggers

A trigger activates a stored or inline procedure that SQLBase automatically executes when a user attempts to change the data in a table. You create one or more triggers on a table, with each trigger defined to activate on a specific command (an INSERT, UPDATE, or DELETE). You can also define triggers on stored procedures.

Triggers allow actions to occur based on the value of a row before or after modification. Triggers can prevent users from making incorrect or inconsistent data changes that can jeopardize database integrity. They can also be used to implement referential integrity constraints. For details on referential integrity, read Chapter 6, Referential Integrity in the SQLBase SQL Language Reference.

For details on the trigger execution order before a single data manipulation statement is executed, read the Section DML Execution Model in Chapter 1 of the SQLBase SQL Language Reference.

System catalog tables

SQLBase maintains a system catalog, or data dictionary, in every database. The system catalog contains tables that store information about the database’s tables, views, columns, indexes, execution plans, and security privileges.

When you create, change, or drop a database object, SQLBase modifies rows in the system catalog tables that describe the object and how it is related to other objects.

The system catalog also contains the name, size, type, and valid values of each table’s columns.

You can query the system catalog tables just like any other table.

Read System Catalog Tables for detailed information about the data dictionary tables.

Concurrency and consistency

To guarantee the integrity of data, SQLBase uses page locks to prevent users from changing data while another user is reading or changing data on the same page. Locks prevent lost updates from occurring and ensure that data does not change while you are reading it or changing it yourself.

SQLBase stores data in 1K pages, and can lock both base table pages and index pages. For details, read the section Database locking in Chapter 4, About Databases.

Databases and networks

This section describes client/server terminology as it applies to SQLBase.

Distributed processing

The term distributed processing describes the placement of computers where they are needed, whether on the floor of a manufacturing facility, in an accounting department, in a laboratory, at a field location, or in an executive office. A network links computers so that they can exchange information, share resources, and perform distributed processing.

A distributed processing network allows information to originate anywhere in the network. You can place systems where they are needed while still having access to the facilities of other widely dispersed systems. Information can be exchanged among all parts of an organization.

The term distributed processing had been used for several years to label network configurations where remote computers exchanged data. Two developments in the 1980s brought a new meaning to the term: PCs (Personal Computers) and LANs (Local Area Networks).

PCs supplied solutions to many simple information processing requirements. The user interface on PCs became sophisticated with bit-mapped displays that used windowing and graphics, and a mouse. This made applications easy to use.

With a LAN, PCs could communicate with each other and share resources such as databases. Users began to wonder if they could move some transaction processing systems, which had traditionally run on minicomputers and mainframes, onto PCs. They also began to think about new types of applications that had not been possible before.

The client-server model

Client-server technology is a form of distributed processing where computing activities are shared among cooperating, networked computers. An application is functionally split into two or more programs which execute on different computers and communicate with each other by passing messages across the network.

The client programs run on users' PCs. Server programs run on more powerful computers. The client sends a request to a server and when the server receives the request, it processes it and sends the results back to the client.

Typically, a local-area network contains more clients than servers. Client computers cannot share their resources or use the resources of other client computers.

Through client-server architecture, users gain access to capabilities available on a server computer which are not available locally on their desktop computer. A client-server application has the capabilities of both the client and the server.

Servers are usually the most powerful computers on the local area network. Certain criteria make some systems better suited to be servers, such as a high input/output transfer rate and network throughput (the speed at which a server processes network tasks). High disk capacity and multiple disk controllers are another criteria. A server is usually multi-tasking so that it can serve multiple clients simultaneously.

Cooperative processing

When the processing duties are shared by a client and server in a sophisticated manner, it is called cooperative processing. An example of cooperative processing is a distributed database system. A client takes user input and submits processing requests to the server. Database processing, such as retrieving and sorting data, is centralized at a database server. Application processing is distributed between the clients and the server.

The client and server cooperate to do a job. The client manages the screen display and graphics, carries on an interactive dialogue with the user, validates data, does some local computation, and provides a network interface that talks to the server. The database server is responsible for data storage, security and integrity, and input/output.

In a cooperative processing environment, components of an application run where they are best suited. Data-intensive applications suffer performance problems when all computing happens on the client. The idea is to off-load time- and data-intensive tasks to the database server while doing as much processing on the client as possible. It is often the case that requests are processed faster and more economically on a shared high-performance database server than on a stand-alone client computer.

By their nature, some applications are suited to cooperative processing. For example, a network-intrinsic application is one that does not make sense on a stand-alone system. Communications is an integral part of the overall process of completing the application. Distributed database is a good example of a network-intrinsic application because it uses the network to provide shared access to data for many users.

Distributed database

Database software has evolved. At first, database software was monolithic and ran on one large computer accessed by dumb terminals. Later, database software become more sophisticated and split the database function into two components: the client (frontend) and the server (backend). Note that both components still ran on the same mainframe or minicomputer at this time.

The start of database software evolution

Eventually, database software vendors moved the client component down to microcomputers and LANs, and the database server soon followed.

A database server and clients on a LAN

The difference from file servers

Database servers are different from file servers. File servers are responsible for storing, not processing, data files. Application processing is the client’s responsibility and a client’s request for data results in the file server sending an entire database across the network to the client. As the number of clients making requests for data increases, so does the likelihood of network performance degradation and bottlenecks.

In contrast, a relational database system has the flexibility and functionality necessary to work across networked computer systems. A client application uses SQL to talk to a database server. The client is responsible for executing the application and formatting a compact SQL command which it then sends across the network to the database server. The database server is responsible for parsing and executing the SQL command and sending data and a status code back to the client.

Unlike a file server, a database server sends only a part (subset) of the database back to the requesting PC. Most database servers use SQL because it is a convenient language for specifying logical subsets of data.

Cooperative processing: a client and server communicating via SQL

Benefits of distributed databases

The benefits of distributed databases are:

• Location independence. The location of a client application is independent of the location of the data.

• Location transparency. Users can access a database without having to know its location, and you can move a database with no affect on users or applications.

• Incremental application growth. You can upgrade an application by adding more clients or by purchasing a more powerful database server machine.

• Site autonomy. You can maintain each database separately from other databases and you can administer the shared data and allow programmers to support the applications.

• You can distribute and secure data to match individual departmental needs.

• Hardware and software independence. You can use hardware and software from different manufacturers, and you can match them to the requirements of your applications.

• Distributed processing. You can distribute the processing and storage of data among many computers.

• Increased availability and reliability. If a database server goes down, only its clients are affected, not your entire organization.

No comments:

RDMS concepts

RDBMS Concepts

This chapter briefly explains some concepts that you need to understand in order to use SQLBase software:

• A DBA's responsibilities.

This section discusses the responsibilities of a Database Administrator.

• Databases and networks

This section describes client/server terminology as it applies to SQLBase.

Database administrator (DBA)

A DBA's specific responsibilities include:

• Keeping the database servers running on a daily basis.

• Diagnosing and resolving system problems.

• Installing both database server and client software.

• Creating databases.

• Backing up and recovering databases.

• Controlling security and access to the database and its objects.

• Monitoring and tuning the performance of a database as well as the applications that access it.

• Managing communications.

• Managing disk space and estimating storage needs.

• Advising developers about table, index, and view design, multi-user considerations, networking, and loading, converting, and unloading data.

• Ensuring data availability, accuracy, completeness, integrity, and consistency.

• Auditing the use of a database.

• Administering the system catalog.

• Helping database application users.

Organizing the DBA function

There are two SQLBase authority levels that a DBA can have:

• SYSADM

The most powerful authority level. There should only be one user with SYSADM authority. This user can designate one or more other users as DBAs.

• DBA

A user with this authority level has privileges on all database objects and can grant, change, or revoke the object privileges of any user but SYSADM.

Access to the SYSADM and DBA accounts should be tightly controlled.

If you work in a small organization, the user with SYSADM authority may be the DBA as well.

Read Security and Authorization for more information on authority levels.

Relational databases

Users need data that is easy to access and that they can use in both ad- hoc queries and daily transaction processing. The data structures should be flexible.

Representing relationships

Customers and Orders tables in an order entry system

This link enables SQLBase to match a row in the Orders table to its associated row in the Customers table, allowing you to find the customer name and address for each order.

Terminology

The terms used to refer to database objects vary from system to system. The following table shows how the terms relate to each other. Gupta uses the "Relational database" terms.

Relational database	Set theory	Conventional data processing
Table	Relation	File or data set
Row	Tuple	Record
Column	Attribute Domain	Field

Relational databases reduce the storage of redundant data. This has three benefits:

• You can rearrange data and combine it easily into new relationships; it is not locked into certain relationships because of the way it is stored.

• You can more easily update data because there are fewer instances of the data. This reduces the likelihood of errors arising from a failure to update all instances.

• It reduces disk space requirements.

Relational operations

Codd's relational algebra defined three key data retrieval operations.

Selection

Projection

Projection is the process of retrieving one or more columns from a table in a specified order and removing any duplicate rows.

Join

Joining is the process of retrieving columns from different tables where the values of a common column in each table are equal.

The ability to join two or more tables is powerful. The join operation distinguishes relational systems from non-relational systems.

The SQL SELECT command performs selection, projection, and join operations.

Comparison to other models

How SQL organizes data

SQL objects include:

• Databases

• Tables

• Columns

• Indexes

• Views

• Synonyms

• Stored commands

• Stored procedures

• External functions

• Triggers

Databases, tables, and columns

A database is a set of SQL objects. When you create a database, you are actually naming an eventual collection of tables, associated indexes, and views.

A database contains one or more tables. Each table has a name and contains a specific number of columns and rows. Each column in a row is related in some way to the other columns in the same row.

Each column has a name and a data type. Data values exist at the intersection of a row and a column.

Data uniqueness

Changing the database structure

SQLBase has SQL commands that enable you to add new columns to existing tables and make existing columns wider. These change take effect immediately and no database reorganization is necessary.

Indexes

An index is an ordered set of pointers to the data in a table. It is based on the data values in one or more columns of the table. SQLBase stores indexes separately from tables.

SQLBase decides whether or not to use an index when accessing a table, and users need not be aware that SQLBase is using an index.

An index provides two benefits:

• It improves performance because it makes data access faster.

• It ensures uniqueness. A table with a unique index cannot have two rows with the same values in the column or columns that form the index key.

Views

Synonyms

Stored commands and procedures

SQLBase also provides preconstructed procedures as useful tools to help you maintain your database. See Appendix B for a description of SQLBase-supplied procedures.

External functions

Triggers

For details on the trigger execution order before a single data manipulation statement is executed, read the Section DML Execution Model in Chapter 1 of the SQLBase SQL Language Reference.

System catalog tables

When you create, change, or drop a database object, SQLBase modifies rows in the system catalog tables that describe the object and how it is related to other objects.

The system catalog also contains the name, size, type, and valid values of each table’s columns.

You can query the system catalog tables just like any other table.

Read System Catalog Tables for detailed information about the data dictionary tables.

Concurrency and consistency

SQLBase stores data in 1K pages, and can lock both base table pages and index pages. For details, read the section Database locking in Chapter 4, About Databases.

Databases and networks

This section describes client/server terminology as it applies to SQLBase.

Distributed processing

The client-server model

Typically, a local-area network contains more clients than servers. Client computers cannot share their resources or use the resources of other client computers.

Cooperative processing

Distributed database

The start of database software evolution

Eventually, database software vendors moved the client component down to microcomputers and LANs, and the database server soon followed.

A database server and clients on a LAN

The difference from file servers