Databases

 < Day Day Up > 



The subject of databases is expansive, to say the least. Fortunately, this section only contains the concepts, terms, and security issues that you'll need to understand for the Security+ exam.

In short, any collection of information that is grouped together in a structured manner can be called a database. The phonebook on your shelf is a database. However, when information is added to an electronic database and cross-referenced, it turns into a powerful and dynamic resource. Electronic databases enable users to store, access, modify, and retrieve data while enjoying total control over which elements of the database they wish to work with. When you look up a number in the phonebook, you are limited to searching by last name but what if you're not sure how to spell it? The only option is to try every different spelling and waste valuable time. In a computer database of cross-referenced names, addresses, and phone numbers, the process can be reduced to seconds. The more tidbits you know about someone's entry in the database-such as their first name or zip code-the quicker it can be found.

When discussing the popular relational database model, some traditional database terms are often used synonymously with their modern equivalents. A relation is simply a table of rows and columns in a relational database. Refer to Table 7.1 to avoid confusion in this matter. You might see these terms substituted for each other, so it's a good idea to know them up, down, and backwards.

Table 7.1: Interchangeable Database Terms

Traditional Terms

Standard Terms

Relational Database Terms

File

Table

Relation

Record

Row

Tuple

Field

Column

Attribute

What follows is a brief list of the terms commonly used when discussing databases. This is not meant to be a Databases 101 lesson-just a small word list to help you understand the concepts in this section.

  • Attribute: The component that describes an object or entity in a database. In a relation, attributes are represented as vertical, named columns.

  • Cardinality: The number of rows (or tuples) in a relation. Keep in mind that cardinals are red so you will remember the three R's of cardinality: Red, Row, Relation.

  • Cell: The point at which a row and column intersect.

  • Degree: The number of attributes in a relation.

  • Domain: The set of all possible values that an attribute can have.

  • Filter: Criteria used to include or exclude specific information in the results of a database query. Also used to sort results and filter database input.

  • Index: Typically used in large relations, indexing deals with the structure and reordering of tuples in a database.

  • Metadata: Data that describes a database, or simply data about data. Contains relation and attribute names that form the structure of a database, information about relationships, and data on who's allowed to access what in the database.

  • Query: This is a command used to fetch and display requested information from a database.

  • Relational keys: These are used to establish relationships between relations. Primary keys uniquely identify each tuple in a relation. Each relation contains only one primary key. Foreign keys are used to identify tuples from a separate relation, thus enabling relationships to be made across different relations in the database.

  • Relationship: This specifies a link between tables (or relations) in a database. Different database models establish relationships in different ways. Relational databases make relationships by connecting tuples between multiple relations. There are three types of relationships to be concerned with. A one-to-one relationship is where one tuple in relation A is related to one tuple in relation B. A one-to-many relationship is where one tuple in relation A is related to one or more tuples in relation B but the individual tuples in relation B are restricted to one-to-one relationships with tuples in relation A. Finally, many-to-many relationships describe the situation where both relations contain relationships to multiple tuples in the other's relation. Dr. Seuss, anyone? Many-to-many causes problems with data redundancy, making modifications, updates, and deletions more complicated.

  • Tuple: In a relation, tuples are represented as named rows. They are essentially horizontal rows of attributes. They are collections of facts pertaining to single entities or objects.

  • View: This is a method of manipulating and controlling the display of (and access to) data. Views are used to control who can change or view individual elements of a database. For instance, an administrator might use a view to prohibit social security numbers from being displayed when queries are made to a personnel database. A view can control access to data based specific user accounts, or on groups of users.

Types of Databases

The most significant types of databases are listed below. Most databases encountered in the real world are relational but each has its own characteristics that need to be explained.

Flat Databases

This variety is the simplest to program and contains the least amount of structure. A single flat file comprised of one table can be the source of data in a flat database. A comma-separated file of names, e-mail addresses, and numbers is an example of a simple flat database. This works fine with a small collection of information. Problems with this kind of database come to light when a flat file starts to grow larger. Flat files generally waste space as a result of their structure, so queries can take much longer as the file gains weight. In addition, they are not suited for complex queries since a program must make separate searches through the data, then sort through everything to deliver results. You might recall from history classes that people once believed the world was flat. Flat databases share a similar flat view of the database world, so they are limited to simple tasks. The flexibility of other types of databases solves these problems.

Hierarchical Databases

Similar to the file management technique used on PCs, hierarchical databases contain an upside-down tree where one table resides at the top and others branch out from beneath it. Connected by links known as pointers, so-called parent and child tables form the relationships in this kind of database. This is a good memory aid as parent tables can have many children but a child table can only have one parent. This model is an improvement over flat databases but there are still problems with data redundancy. One-to-many relationships work well with the hierarchical model.

Relational Databases

As we discussed, the concept of relations is the driving force of this popular database model. The actual placement of relations in this type of database is insignificant because they are referenced by name and connected by dynamic relationships. Opposed the hierarchical model, a user need not know the structure of a relational database to access or modify it. This model makes efficient use of the space it's allocated because redundancy is kept to a minimum. It also enables complex queries that take full advantage of the inter-connected tuples and relations. Remember that tuples and relations are synonymous with rows and tables. With the assistance of detailed metadata, relational databases offer outstanding database administration utilities.

Object-oriented Databases

Developed to blend nicely with object-oriented programming languages such as Java and C++, object-oriented database (ODB) technology offers advantages for programmers. Because the database programming is done in an object-oriented language native to the database itself, it does not require translation and you end up writing less code. Another benefit of this model is a reduction in maintenance. The systems used to manage ODBs are called object database management systems (ODBMSes). ODBs can operate at higher speeds than relational databases, but they are not as widely used. Some technologies can incorporate object and relational databases into a unified system, called an object-relational database. ODB's use data models that classify objects and give them unique properties. They also support encapsulation and the use of object identifiers (OIDs). One popular method for establishing relationships in an ODB is called inheritance. This is when objects of one subclass inherit properties from objects of a higher class (or superclass).

Network Databases

Lending itself well to many-to-many relationships, the network database model relates data groups to each via named sets. Similar to hierarchical databases, network databases support the parent-child model but children in network databases can have more than one parent.

Note 

Different database models establish relationships in different manners. Relational databases use relations. Hierarchical and network databases use hierarchies of parents and children. In object-oriented databases, objects inherit classes from higher-level objects.

Data Warehousing

Before data can be mined, which we'll learn about shortly, it must be collected in one place. A data warehouse is a central storage area for relevant business, production, and customer data. Supplied by a mainframe server, warehouses generally contain historical information pertaining to the business activities of an organization. Because a data warehouse is constructed for the purpose of finding trends, by design, they facilitate data analysis. In order to keep a warehouse up-to-date, its contents must be maintained (or scrubbed) to remove irrelevant and outdated information.

The term data mart is often heard in the discussion of data warehousing. To avoid mixing the two up, keep in mind that a data mart, which can be derived from a data warehouse, is where metadata is stored. A data warehouse is where the actual databases are stored and does not contain metadata.

Another similar term to look out for is data dictionary. Comprised of database details and other management or accounting information, data dictionaries are set aside for use by system developers and are typically concealed to prohibit user intervention. The data dictionary is also the predecessor of the data encyclopedia, which contains even more information and is used in data modeling procedures.

Note 

Data warehouses supply data mining applications with a stockpile of historical information.

Data Mining

While warehousing stores data, data mining aims to make sense of this data by finding the hidden correlations buried within. Data mining techniques facilitate the analysis of large amounts of information-such as historical business data-in order to extract patterns, relationships, and trends. Although data mining is being put to use in many fields, it's enjoying widespread use in the e-commerce arena. The idea is that through the use of specialized algorithms, mining applications can sort through heaps of data and uncover connections that are invisible to a human analyst. This can be extremely useful to online retailers, for instance, by determining their customers' purchasing patterns.

Through its complex procedures, data mining programs can quickly answer specific questions such as, Who is likely to buy more expensive items and at what time of year? What purchases can be expected next year from buyers of coffee machines in Rhode Island? Through analysis of sales history and comparison of all the related data, these and other more intricate questions can be answered. The Holy Grail of data mining is not as easy to obtain as it is to explain. Specialized mining techniques require quite a bit of planning and work. One must first have a clear, concise idea of which particular pattern or trend needs to be uncovered. Numerous companies are devising ways to offer custom-tailored data mining services to other businesses as this process gains popularity. Data mining is expected to grow by leaps and bounds in the coming years, which will inevitably result in more effective applications that are easier to implement. The components of the data mining process that you should know well are listed shortly. Mining attempts to expose hidden information in data such as the following:

  • Associations: This involves the effort to uncover events that are interconnected, such as how often computer buyers include a new monitor with their purchase.

  • Classifications: This involves the search for new patterns in the chaos that result in a restructuring of the existing data.

  • Clusters: This involves the search for previously unknown sets of facts. Cluster data is usually accompanied by visual representations that display these uncovered groups of facts.

  • Forecasts: This involves the search for recurrences in data that can aid in making predictions about future events.

  • Sequences: This involves the hunt for instances of events that are followed up by another event after a period of time. For example, instances of customers buying a certain model of motorcycle, and then returning a month later to purchase a more comfortable seat. The discovery of this sequence could tell a company that the seat in question needs to be redesigned.

Another key phrase to look out for is Customer Relationship Management (CRM). Data mining is one of the tools used within CRM. The goal of a CRM project is to learn about the behavior of customers in order to provide higher quality customer service, more efficient sales techniques, and ultimately, to increase revenues.

Web mining, another tool used in CRM, is an e-variation of the standard data mining process, with provisions for data collected via a Web site. These provisions include content, usage, and structure mining. Content mining works with data from Web spiders and search engines. Usage mining analyzes data related to a user's environment-such as the browser, and so on-as well as information collected by way of user submitted forms. Finally, structure mining analyzes data collected from user interaction with the URLs in a Web site. As with traditional data mining, Web mining utilizes custom applications that don't adhere to any strict standards.

Database Security Issues

Just like access to any other server resource, the right to view or modify data in a database is mostly controlled through the use of user accounts, roles, and security classifications. Therefore, the same policies regarding passwords and other user authentication techniques are observed when dealing with database access.

The database administrator (DBA) is the individual who has total control over all aspects of a database. Other, more restrictive administrator accounts can be created that allow trusted users to perform lower-level tasks such as creating other user accounts. A DBA can also create roles that have specific security permissions. For example, a role called Developer could be defined for users that engage in development tasks. The Developer role would allow users to do things like create tables in a database. This is a risky permission to give to the wrong person, so the right people must be selected carefully. Database security issues present unique challenges and the following sections will explain the major risks as well as the preventative measures used to keep things secure.

Aggregation

In the context of databases, aggregation is a condition where one has an assortment of low-level access rights, which, when combined, grant access to information of a higher sensitivity. In other words, the security level required to access the highly sensitive stuff is greater than the security level required for the low-level data but a collection of low-level rights gives you access to the sensitive data nevertheless.

The problems related to aggregation have to do with classifications. Careful thought must go into classifying aggregates in a database. For instance, if an aggregate is classified as secret, but the separate elements needed to access the aggregate are unclassified, you might run into trouble. In this example, an unclassified user may be able to inadvertently access the aggregate.

In another example, imagine that individual records containing the phone number of a person in a group are unclassified but the record that combines all of the phone numbers of everyone in the group is classified as secret. The secret record of the combined numbers is the aggregate in this example. So, if you were an unclassified user, you could obtain the numbers one-by-one and effectively put the rest together yourself. Therefore, through the process of aggregation, you've obtained some data classified as high-level even though your user account has not been granted high-level access. It's not as tricky as it sounds and a test question about aggregation will probably give itself away in the wording. Just be careful that the terms aren't reversed on you and you'll be okay. Study the differences between aggregation and inference, as they are all similar. These similarities will be used to throw you off course.

Inference

Literally, to infer information means to gather it by some means, such as deductive reasoning or through some amount of previously known information. The word, inference, is synonymous with the words conclusion and deduction. Inference, as it pertains to databases, this book, and the Security+ exam, however, has a more profound meaning. So, for our purposes, inference is a problem where unauthorized users deduce (or infer) data at a sensitivity level that they haven't been granted rights to access. It is commonly said that the security problem here is in prohibiting the inference of unauthorized data. This sounds similar to aggregation but these explanations should be all that is needed to make a distinction. Questions about inference might mention low-level data portraying high-level data. Inference attacks try to take advantage of the inference problem. One preventative measure used to counter inference attacks is called cell suppression, which conceals the cells in a database that might enable an attack. Another method used to combat breaches in security caused by the inference problem is called polyinstantiation.

Polyinstantiation

Here's one that's not in your dictionary. The use of polyinstantiation enables multiple copies of the same data to exist in multiple places in a database, each having a different security level. This security model enables multiple levels of authorization and protects against inference attacks. In database lingo, polyinstantiation is when a relation has multiple tuples with the same primary keys, all stored in the same database. The defining factor is that the primary keys in this setting each have their own security level. Polyinstantiation works by tricking low-level users into thinking they're viewing high-level data. Because polyinstantiation promotes the storage of information in more than one area of a database, every instance of duplicated data must be updated when there is a change. Without simultaneous updates, there is a risk of data integrity loss.

Another process that proposes a risk to data integrity is known as denormalization. A database is denormalized to increase its processing speed, but the duplication it introduces to the database causes the same problem as polyinstantiation.

Perturbation

To perturb means to agitate or displace. Perturbation is another tool used to fight inference attacks; it seeks to confuse intruders. This is accomplished by infusing phony information into a database in the hope that an attacker will become frustrated enough to give up and go elsewhere. Perturbation is also called noise.

Partitioning

Partitioning is used to split a database into multiple parts. In the context of security, the idea is to complicate the inference of information by unauthorized users. If the interconnected parts of a database are difficult to locate, the unauthorized discovery of sensitive information turns into hard work.

Multilevel Security

Multilevel security (MLS) is a system that employs a centralized structure of security classifications called labels, which are applied to users and objects. The MLS model makes use of mandatory access controls (discussed in Chapter 2) to manage database access. The component that defines these security labels in an MLS is called a security policy.

An MLS system is characterized by its ability to allow multiple classification levels of data to be simultaneously executed. The biggest threat to security in the MLS model is the inadvertent opening of covert channels. Many MLS-based systems are susceptible to this method of sidestepping the built-in security and solutions to the covert channel problem are notoriously difficult to implement. In an MLS-based database, a covert channel is an exposed link between high-classification data and low-classification data. In effect, if someone can circumvent the access controls in the MLS system and find a way to leak information from high to low levels, they have tuned in to a covert channel.

In contrast to the multilevel secure approach to protecting data, the system high model provides a more restrictive and less efficient method of access control. To access a database under the system high model, a user (or any other connected computer) must have clearance to the highest level of data contained in the system. For example, if there's some top-secret info in the database, a user must have top-secret clearance to access the database.

Database Management System (DBMS)

This is where working with databases starts to resemble actual fun. A database management system (DBMS) is an application (or suite of applications) that creates, controls, manages, and provides access (through the use of queries) to data in a database. There are many players in this field and the competition is quite fierce.

A DBMS can be based on several different query languages but SQL (Structured Query Language) is one of the most popular languages used. SQL is the quasi-standard query language for use with relational databases although different companies add their own extensions to SQL, thus creating proprietary versions of this supposed standard. SQL is pronounced either like the word, sequel, or spoken as S-Q-L.

Although the folks at Microsoft might have put SQL on the map with their SQL Server suite, this query language was originally developed in the mid-1970s by IBM and is currently incorporated into the products of other companies such as Oracle and Sybase. ANSI standardized an early version of SQL in 1986, which was updated in the early nineties and is now called SAG SQL, an acronym for SQL Access Group.

Whichever query language is being used, they all share a common goal. They offer a means of interacting with a database through the use of various commands and provide rules that govern how database queries are constructed. In turn, the DBMS executes those commands and delivers the goods. DBMSes also have the ability to print reports from the results of a database query and some can enhance these outputs with charts or graphs.

What follows is a simple example of a DBMS at work processing a query. Let's say a user asks a database to return any records it has containing the location of post offices within 20 miles of their home that are accessible 24 hours a day. The DBMS references the database using specific commands native to the query language it's running. The query can either be handled by a graphical interface that converts user input into the right commands or by the user typing the commands directly into the system. Either way, the DBMS will search the database, find any results, and return them to the user. An application can also be programmed to return a custom response if no matching data is found. Additional features can enable a DBMS to deliver information related to the query, although it wasn't specifically asked for.

The extensive use of relational databases has led to the development of DBMSes that are specifically designed to manage them. These DBMSes are called Relational Database Management Systems (RDBMSes). Much advancement is being made in this field and more complex query languages are currently being developed. DBMS architecture that supports these newer, fourth generation languages are sometimes called 4GL systems.

Note 

SQL is a query language standard used by the database management systems of several manufacturers.

Garbage Collection

Now this is a straightforward term. Also known as storage reclamation, garbage collecting is a method used to free up space in a database. In a typical database where objects are added and not overwritten, database size can become a problem. Usually implemented in an automatic process-for example with a Java program-garbage collecting applications check for objects that aren't referenced by the database anymore and destroys them. These programs perform housekeeping duties for the database administrator because the admin has better things to do. This one is painless to learn-garbage collection recovers space.

Online Transaction Processing (OLTP)

Databases that deal with customer interaction must operate in real time. Typically distributed over multiple computers in a network, Online Transaction Processing (OLTP) mechanisms provide immediate response to requests. In an OLTP system, a process called brokering facilitates network distribution. In this type of processing, every user request is called a transaction and real-time results are what an OLTP system delivers. Automated teller machine transactions are a perfect illustration of OLTP. Among others, the airline industry uses OLTP systems.

Transaction controls exist that protect the integrity of transactions. Transaction control objectives provide quality assurance with regard to incoming and outgoing data in a system. Input controls check user entries such as date/time and ensure that other types of input are complete and correct before they are processed. Output controls handle the accuracy of data that a system delivers, providing a checkpoint for information before it's printed or displayed. In a similar way, processing controls search for inconsistencies within data that is in the midst of input and output.



 < Day Day Up > 



The Security+ Exam Guide (TestTaker's Guide Series)
Security + Exam Guide (Charles River Media Networking/Security)
ISBN: 1584502517
EAN: 2147483647
Year: 2003
Pages: 136

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net