Introduction to Universes | Crystal Reports XI Official Guide

The idea of the universe is to capture and encapsulate elements of the database as objects, and that these objects can be selected in any combination by a business user to answer a question (hence, the company name Business Objects). This concept of providing an abstraction of the physical data layer was developed and patented by Business Objects in 1992. Since that time, the semantic layer has been continually enhanced and developed in conjunction with the reporting engines to provide a very simple yet powerful user experience for creating ad-hoc reports, interacting with reports, and performing analysis of the data. The overall design paradigm is to provide ease of use for the ultimate consumer who is assumed to be a nontechnical business user.

The universe itself contains no data; it is a file containing the pointers to the data. The universe is utilized by the reporting engine to generate Structured Query Language (SQL) queries according to the object definitions and other rules defined within the universe. In Business Objects XI, it is used by the Web Intelligence engine to generate all the SQL for any report requests. The Web Intelligence engine executes the query and creates the report, and will continue to use rules and definitions of object relationships defined in the universe to allow for interactive exploration and analysis of the data (more on this in Chapter 20, "Introduction to Web Intelligence").

Because the semantic layer paradigm is so applicable to any reporting tool, the universe was easily integrated with Crystal Reports 10. As of June 2004, with the release of the Business Objects Crystal Integration Pack, the universe can generate SQL for Crystal Reports. This integration was continued and enhanced in Crystal Reports XI.

Universe Objects

The universe contains several object types that are represented with different icons. The object types represent different relational concepts and are used by the reporting engines in different ways.

Icon	Object Type	Definition
	Dimension	Parameters for analysis. Dimensions typically relate to a dimension hierarchy such as geography, product, or time.
	Detail	A description of a dimension; usually not used for analysis.
	Measure	Convey numeric information which is used to quantify a dimension object.
	Condition	A Universe designer.

The universe is accessed by the end user through a query panel. The query panel provides a drag-and-drop interface where business users can select, in any combination, the objects they want to see, provide conditions for the query, and then execute the query. The report engines review the objects chosen and, based on the semantic rules defined in the universe, generate appropriate and correct SQL on the fly.

Figure 18.3 shows the Web Intelligence query panel. Note the four object types in the preceding table appear in the left frame. Users see these objects (business objects) and can randomly select any combination by dragging and dropping them in the Result Objects frame on the right. The universe provides the logic for correct, sophisticated SQL generation based on any combination of objects.

Figure 18.3. Web Intelligence query panel.

The query panel will be discussed in detail in Chapter 20.

Creating the Universe

Universes are created in a separate module known as the Designer. The Designer is a client/server tool installed on the desktop of only those who create the semantic layer. This is generally an IT function. The universe is created with this tool and then exported to the Central Management Server (CMS) database or InfoStore, where it is accessed by the Business Objects Enterprise infrastructure and the Web Intelligence or Crystal Reports engines. Other Designers can also access it and enhance it, and re-export it back to the InfoStore. As with business views, there is currently no version control mechanism for universes. However, each time a universe is re-exported to the CMS the internal universe revision number is incremented.

The Designer tool requires authentication via the CMS and the proper rights to access the Designer module. After authenticated, the Designer provides the capabilities necessary to create the universe semantic layer and associated data rules used by the query engines. Figure 18.4 shows a diagram of a universe open in the Designer module.

Figure 18.4. The universe is open in the Designer.

Note that the same objects are showing as in Figure 18.3. The right pane shows the database schema and possible joins defined by the universe creator; the left side shows the objects visible to the end users via the query panel. Users will never see the underlying database structure, only the objects and classes in the left pane.

The window is divided into two main sections. The right side of the window is called the Structure Pane and shows the tables and views of the underlying database, along with any derived tables and joins defined with the Designer. The left side of the window is called the Universe Pane, and shows all the objects exposed by the universe. The Universe Pane also shows the object classes (often incorrectly called folders) that are arbitrary groupings of objects to best fit the user requirements. There is no requirement to group objects from the same table in the same object class. Classes can contain subclasses as well as objects, and objects can live at any level of the class/subclass hierarchy, but must be contained within a class/subclass.

Tip

As a best practice, objects are generally separated into classes according to dimensional groupings known as subject areas. For example, all of the time objects (Year, Month, Week, Day, Hour, and so forth) are put into a Time class; likewise, all customer elements are grouped in a Customer class and so forth. Measure objects are also placed in a single Measures class that is often the last top-level class in the tree. This is not a required grouping but is the convention used and taught by Business Objects.

In order to create a universe, a preexisting universe connection must be selected from the InfoStore or created via the Designer. The universe connection is an object in the InfoStore that defines the connection information to the underlying database. It specifies the RDBMS and version, username, password, and middleware type and version. It is encrypted and stored in the InfoStore and used by the Web Intelligence engine or the Designer module when generating SQL.

The connection can be static or "dynamic" (based on some parameter such as the logged-in user); there is no differentiation of static or dynamic data connections, as there is in business views. Universe Designer does not support true dynamic connections, but allows for restriction set creation that can switch between identically formatted databases based on current user and/or group identities.

Unlike business views, the universe has one and only one connection. The Web Intelligence report engine can accommodate multiple resultsets, so reporting can alleviate this somewhat (see the "Microcube" section).

Object Definition

The object itself contains a SQL snippet representing the piece of the database the object reflects. For example, if the database contains a customer table, there might be a dimension object called Customer Name that contains the definition customer_table.last_name. If, for example, the business user would prefer the first name and last name, the object definition can be redefined as customer_table.first_name + " " + customer_table.last_name.

Object definitions can be defined using any of the native database-specific extensions (such as CASE, CAST, DECODE, DATEPART, and so on). This allows for the development of quite complex objects. The universe supports all native RDBMS-specific SQL extensions, as well as any custom database functions developed in-house.

Dimension objects can be defined to exist in a dimensional hierarchy. A dimensional hierarchy defines natural parent-child relationships within the data. The most common example of a hierarchy is Time, which is often defined in the following hierarchy: Year, Quarter, Month, Week, Day, and so forth. The hierarchy mechanism is used by the Web Intelligence reporting engine to provide drill-down capability. With a universe, drill down refers to exploring data at some level below the current level of aggregation. In other words, when looking at Sales for the Year, a user can drill down to look at Sales by Quarter, and from there drill down to look at Sales by Month. Users can also drill up (move up a level in the dimensional hierarchy) and drill by (swap the presentation to other dimensions in a completely different dimensional hierarchy). More discussion on drilling can be found later in the "Microcube" section, and in more complete detail in Chapter 20 in the "Drilling" section.

Measure objects are almost always defined with aggregate SQL functions such as sum, count, max, min, and so on. Although not required, the universe design is such that aggregate functions are preferred in the object definition because this often engages more appropriate SQL optimization on the underlying RDBMS. It also returns smaller resultsets that should aid in a better total request and response time during an ad-hoc request.

Tip

In general, database optimization is more important in providing user-expected performance for an ad-hoc application than it is in straight enterprise reporting. Most enterprise reports are generated prior to actual report consumption, such as during off hours in a batch process, so runtime and processing time are less important. Put another way, the computing necessary for report processing does not happen right before report consumption by the user. In an ad-hoc application, you can assume that the query requests happen during the actual report consumption (users are actually generating and executing the SQL requests at the time of report consumptionnot in batch). The users expect maximum analytical flexibility and minimal response timeeasier said than done, especially because by definition an ad-hoc reporting application does not know what SQL requests will be made (hence ad-hoc). The best practice for developing and deploying successful ad-hoc applications is to gather extensive user requirements to understand why the business users are asking for the data and what they do with it next, and then build an appropriate model to support those requirements.

Tip

Universes are extremely flexible, and many of the universe mechanisms (contexts, aggregate awareness, short-cut joins, aliases, and so forth) can be used to ensure that proper SQL is generated on any model type. Universes do not require underlying star schemas and can be deployed on any underlying supported database regardless of the model type. However, if the goal is to provide maximum analytical flexibility, proper data modeling cannot be avoided. Often times a universe will be used as a cheap and easy way to provide business users access to transactional systems. If ad-hoc data analysis was the goal of the business sponsor, placing a universe on the transactional system means the reporting application is not designed properly, most likely because proper requirements were not gathered and a model analysis was not completed. This often results in slow performance, or IT re-entering the project to develop reports for the business because the universe is overly complex and not business centric. In other words, if you expect to slap a universe on top of a transactional system without doing a proper requirements-gathering and then let your business users try to do data analysis, no one will be happy, especially your DBA. The most important thing IT can do to ensure success of an ad-hoc or analytic application is to gather proper requirements.

SQL Generation, Contexts, and Aggregate Awareness

Based on the objects selected by the business user via the query panel, the universe determines which tables are necessary and the optimal join paths between these tables, as defined in the Structure Pane. The universe might contain multiple routes, or join paths, between any two tables. This is most evident in a star schema where multiple fact tables exist (but would apply in any model type). For example, say you have a universe to perform expense analysis, and there is one fact table containing the budgeted amounts along the organizational hierarchy and the time hierarchy (resulting in budget amount by department and by month) and there is a second fact table with the actual daily figures spent by all employees. You might then want to join the time and organization dimensions directly to the budget fact and actual fact. When the business user engages the universe via the query panel, she might want to see the budget amount and actual amount per month for her department. The universe employs a rule called a context that is a grouping of tables and joins that dictates proper join paths based on the object selection. In this case, there would be two contexts, one for the budget figures and another for the actual expenses. This tells the SQL engine that two SQL statements need to be generated in order to get proper resultsone with time, organization, and budget, and the second with time, organization, and the actual expenses. The results are then rejoined together within the reporting engine (Web Intelligence) to present a single result set.

Another feature of the universe is the use of aggregate awareness. Aggregate awareness allows for the inclusion and use of aggregate tables where possible, and use of lower-level tables where appropriate. To further the expense analysis example, suppose that the DBA has aggregated the actual expense data at the month level and by department. This means there is an aggregate of actual expenses, but the atomic-level fact table also exists (which lists each and every expense by employee, expense type, and day). If the business user chooses department, month, and the expenses amount, the aggregate awareness tells the universe that the query can be handled by the aggregate table and to use it instead of the atomic-level table. This can have a great benefit on query performance. Furthermore, if the business user drills down a hierarchy that leaves the aggregate (say from month down to day), the universe is smart enough to know that a new SQL statement needs to be generated and that now it should execute against the atomic-level table.

The Microcube Engine

The Web Intelligence engine contains tight integration with the universe. One of the main points of integration is the storage of the result set in a multidimensional array commonly called the microcube or microcube engine. The cube part of microcube implies multidimensionality (a cube of data rather than a flat table). The micro part of microcube distinguishes it from a macrocube that is a full-fledged multidimensional database (MDDB), such as Essbase, DB/2 OLAP, Oracle OLAP, or MS SQL Server Analysis Services. The microcube is mentioned here in the "Universes and Deployment" section, but is demonstrated in Chapter 20 on the Web Intelligence Reporting engine in the "Drilling" section.

Without understanding the microcube, you cannot fully appreciate how the universe features actually benefit the business user, other than providing a mask for SQL generation. Because the universe knows that the objects are dimensions, details, or measures, these object types can be used to provide for additional methods of post-query processing by the microcube engine. This means the result set that comes back can be processed further simply based on the definition of the result set's originating object type. For example, if a query asks for Year, Quarter, Month, and Actual Expenses, the result set is returned to the microcube engine, which then further notes that Year, Quarter, and Month are dimensions, and Actual Expenses is a measure. It also notes that Year, Quarter, and Month are part of a dimensional hierarchy that can be drilled, and that the Actual Expenses measure object is defined with a projection that tells it to sum when projected. A projection is a behavioral indicator for the measure when placed in the microcube.

When a user displays the data in the report as Year, Quarter, Month, and Actual Expenses columns in a simple table, the user sees a row containing the results just as they were returned from the database. However, if the user removes the Month and Quarter columns from the table, the Actual Expenses value changes dynamically to reflect the value for each year. The measure is "projected at the Year level" because the object definition tells the microcube that it should sum the measure when viewing the measure with any combination of dimensions other than all dimensions returned in the result set.

Additionally, if the user then drills down on Year to the Quarter level, a drill filter is created on Year and the report block displays the quarters and projects the Actual Expenses at the Quarter level. So the drilling capability provides for both proper navigation and projection of dimensions and measures by the microcube.

There are two important things to note here. First, this paradigm is different from Crystal Reports, in which drilling or moving between levels of data requires the creation of subreports. With the microcube, the table, grid, or chart display of a Web Intelligence report is referred to as a report block and there is no concept of bands or banded reporting. What is seen in the report block is the projection of the microcube, which is completely dependent on the objects displayed in the report block. The report often contains more data (additional objects) than is currently projected; generally, the user will use the current projection in the report block as a starting point to navigate through the data. The data dynamically changes as the user interacts with it, drilling up and down or swapping objects into or out of the report block. This might sound terribly confusing but is made clearer in Chapter 20, where more time can be given to this discussion.

Second, the semantic layer does more than query generationit provides for sophisticated post-query behavioral rules for the reporting engine. It not only tells how to get the data and how to get it with the most optimal SQL, but also how the data should behave after it is presented for analysis to the business user. This is crucial to ensuring that the business user does not manipulate the data in ways that would lead to incorrect results (an overarching theme with the semantic layer).

The microcube also provides a few additional benefits:

Recombining results of multipass SQL In certain conditions, such as the one described previously for requesting budget and actual expenses, a question from a business user needs to be broken into multiple queries or multiple passes to the database. The semantic layer automatically generates these multiple SQL statements. When the result sets are returned, the microcube allows for these multiple passes to be rejoined conceptually into a single result set that can then be projected for analysis via the microcube. The microcube knows that the dimensions between these two queries are really representations of the same objects and therefore the microcube will synchronize rows with common dimensions, which can then be projected in synchronization. This is done automatically by the Web Intelligence report engine. This also allows for formulas to be built on these measures as if they existed in a single flat table result set (budget minus actual expenses, for example).
Multiple queries A Web Intelligence report is the projection of part of the multidimensional array or microcube. The Web Intelligence engine allows for multiple microcubes to exist within a single report file. This means that a report can contain data from multiple universes that could be based on completely different RDBMS platforms.
Note

In the multipass SQL example just discussed, the result set is considered a single data provider by the report engine, even though it is multipass; this is because it originated from a single query panel request. Multiple data providers come from multiple query panel requests (one data provider per request).