The front room is the public face of the warehouse. Its
what the business users see and work with day-to-day. In fact, for most folks, the user interface is the data warehouse. They dont know (or care) about all the time, energy, and resources behind itthey just want answers. Unfortunately, the data they want to access is complex. The dimensional model helps reduce the complexity. The primary goal of the warehouse should be to make information as accessible as possibleto help people get the information they need. To accomplish this, we need to build a layer between the users and the information that will hide some of the complexities and help them find what they are looking for. That is the primary purpose of the data access services layer. Front Room Data Stores; -Access Tool Data Stores As data moves into the front room and closer to the user, it becomes more diffused .Users can generate hundreds of ad hoc queries and reports in a day. These are typically centered on a specific question, investigation of an anomaly, or tracking the impact of a program or event. Most individual queries yield result sets with less than 10,000 rowsa large percentage have less than 1,000 rows. These result sets are stored in the data access tool, at least temporarily. Much of the time, the results are actually transferred into a spreadsheet and analyzed further. Some data access tools work with their own intermediate application server provides an additional data store to cache the results of user queries and standard reports. -Standard Reporting Data Stores client/ server-based standard reporting environments are beginning to pop up in the marketplace. These applications usually take advantage of the data warehouse as a primary data source. They may use multiple data stores, including a separate reporting database that draws from the warehouse and the operational systems. They may also have a report library or cache of some sort that holds a pre-executed set of reports to provide lightning-fast response time. Personal Data Marts: The idea of a personal data mart seems like a whole new market if you listen to vendors who have recently released tools positioned specifically for this purpose. The merchant database vendors all have desktop versions that are essentially full-strength, no-compromise relational databases. There are also new products on the market that take advantage of data compression and indexing techniques to give amazing capacity and performance on a desktop computer. Personal data marts may require a replication framework to ensure they are always in synch with the data warehouse. They will continue to play an important role in this personal segment of the marketplace. Disposable Data Marts: The disposable data mart is a set of data created to support a specific short-lived business situation. It is similar to the personal data mart, but it is intended to have a limited life span. The disposable data mart also allows the data to be designed specifically for the event, applying business rules and filters to create a simple sandbox for the analysts to play in. Application Models: Data mining is the primary example of an application model. Its a collection of powerful analysis techniques for making sense out of very large data sets. From a data store point of view, each of these analytical processes usually sit on a separate machine (or at least a separate process) and works with its own data drawn from the data warehouse. Credit rating and churn scores are good examples of data mining output that would be valuable in the context of the rest of the data in the warehouse. Front Room Services for Data Access: Data access services cover five major types of activities in the data warehouse: warehouse or metadata browsing; access and security; activity monitoring; query management; and standard reporting. -Warehouse Browsing Warehouse browsing takes advantage of the metadata catalog to support the users in their efforts to find and access the information they need. Ideally, a user who needs business information should be able to start with some type of browsing tool and peruse the data warehouse to look for the appropriate subject area. The warehouse browser should be dynamically linked to the metadata catalog to display currently available subject areas and the data elements within those subjects. It should be able to pull in the definitions and derivations of the various data elements and show a set of standard reports that include those elements. Once the user finds the item of interest, the browser should provide a link to the appropriate resource: a canned report, a tool, or a report scheduler. Front ends have grown more sophisticated and now use metadata to define subsets of the database to simplify the users view. They also provide ways to hook into the descriptive metadata to provide column names and comments. -Access and Security Services: Access and security services facilitate a users connection to the database. This can be a major design and management challenge. Access and security rely on authorization and authentication services where the user is identified and access rights are determined or access is refused. For our purposes, authentication means some method of verifying that you are who you say you are. There are several levels of authentication constant password is the first level, followed by a system-enforced password pattern and periodically required changes. Beyond the password, it is also possible to require some physical evidence of identity, like a magnetic card. On the database side, we strongly encourage assignment of a unique ID to each user. Although it means more work maintaining IDs, it helps in tracking warehouse usage and in identifying individuals who need help.we need to determine what they are authorized to see. the value of a data warehouse is correlated with the richness and breadth of the data sources provided. Therefore, we encourage our clients to make the warehouse as broadly available as possible.
Authorization is a much more complex problem in the warehouse than authentication, because limiting access can have significant maintenance and computational overhead, especially in a relational environment. -Activity Monitoring Services Activity monitoring involves capturing information about the use of the data warehouse. There are several excellent reasons to include resources in your project plan to create an activity monitoring capability centered around four areas: performance, user support, marketing, and planning. Performance. Gather information about usage, and apply that information to tune the warehouse more effectively. User support. The data warehouse team should monitor newly trained users to ensure they have successful experiences with the data warehouse in the weeks following training. Also, the team should be in the habit of monitoring query text occasionally throughout the day. This will help the team understand what users are doing, and it can also help them intervene to assist users in constructing more efficient queries. Marketing. Publish simple usage statistics to inform management of how their investment is being used. A nice growth curve is a wonderful marketing tool, and a flat or decreasing curve might be motivating for the warehouse team. Planning. Monitor usage growth, average query time, concurrent user counts, database sizes, and load times to quantify the need and timing for capacity increases. This information also could support a mainframe-style charge-back system, if necessary. -Query Management Services Query management services are the set of capabilities that manage the exchange between the query formulation, the execution of the query on the database, and the return of the result set to the desktop. These services arguably have the broadest impact on user interactions with the database. Content simplification. These techniques attempt to shield the user from the complexities of the data and the query language before any specific queries are formulated. This includes limiting the users view to subsets of the tables and columns, predefined join rules (including columns, types, and path preferences), and standard filters. Content simplification metadata is usually specific to the front-end tool whose simplification rules are usually hidden Query reformulation. query formulation can be extremely complex if you want to solve real-world business problems. Tool developers have been struggling with this problem for decades, and have come up with a range of solutions, with varying degrees of success. The basic problem is that most interesting business questions require a lot of data manipulation. Even simple-sounding questions like How much did we grow last year or Which accounts grew by more than 100 percent? can be a challenge to the tool. The query reformulation service needs to parse an incoming query and figure out how it can best be resolved. a query reformulation service should be able to generate complex SQL, including subqueries and unions. Many of these queries require multipass SQL, where the results of the first query are part of the formulation of the second query. Since data access tools provide most of the original query formulation capabilities, Query retargeting and multipass SQL. The query retargeting service parses the incoming query, looks up the elements in the metadata to see where they actually exist, and then redirects the query or its components as appropriate. It allows us to query data from two fact tables, like manufacturing costs and customer sales, on two different servers, and seamlessly integrate the results into a customer contribution report. Aggregate awareness. Aggregate awareness is a special case of query retargeting where the service recognizes that a query can be satisfied by an available aggregate table rather than summing up detail records on the fly. For example, if someone asks for sales by month from the daily table, the service would reformulate the query to run against the monthly fact table. The user gets better performance and doesnt need to know there are additional fact tables out there. The aggregate navigator is the component that provides this aggregate awareness. In the same way that indexes are automatically chosen by the database software, the aggregate navigator facility automatically chooses aggregates. The aggregate navigator sits above the DBMS and intercepts the SQL sent by the requesting client
A good aggregate navigator maintains statistics on all incoming SQL and not only reports on the usage levels of existing aggregates but suggests additional aggregates that should be built by the DBA. Date awareness. The date awareness service allows the user to ask for items like current year-to-date and prior year-to-date sales without having to figure out the specific date ranges. This usually involves maintaining attributes in the Periods dimension table to identify the appropriate dates. Query governing. Unfortunately, its relatively easy to create a query that can bring the data warehouse to its knees, especially a large database. Almost every warehouse has a list of queries from hell. These are usually poorly formed and often incorrect queries that lead to a nested loop of full table scans on the largest table in the database. Obviously, youd like to stop these before they happen. After good design and good training, the next line of defense against these runaway queries is a query governing service. Report development environment.This should include most of the ad hoc tool functionality and usability. Report execution server. The report execution server offloads running the reports and stages them for delivery, either as finished reports in a file system or in a custom report cache. Parameter- or variable-driven capabilities. For example, you can change the Region name in one parameter and have an entire set of reports run based on that new parameter value. Time- and event-based scheduling of report execution. A report can be scheduled to run at a particular time of day or after a value in some database table has been updated. Iterative execution. For example, provide a list of regions and create the same report for each region. Each report could then be a separate file e-mailed to each regional manager. This is similar to the concept of a report section or page break, where every time a new value of a given column is encountered, the report starts over on a new page with new subtotals, except it generates separate files. Flexible report definitions. These should include compound document layout (graphs and tables on the same page) and full pivot capabilities for tables. Flexible report delivery: Via multiple delivery methods (e-mail, Web, network directory, desktop directory and automatic fax). In the form of multiple result types (data access tool file, database table, spreadsheet). User accessible publish and subscribe. Users should be able to make reports theyve created available to their departments or to the whole company. Likewise, they should be able to subscribe to reports others have made and receive copies or notification whenever the report is refreshed or improved. Report linking. This is a simple method for providing drill-down. If you have pre-run reports for all the departments in a division, you should be able to click on a department name in the division summary report and have the department detail report show up. Report library with browsing capability. This is a kind of metadata reference that describes each report in the library, when it was run, and what its content is. Mass distribution.Simple, cheap access tools for mass distribution (Web-based). Report environment administration tools. The administrator should be able to schedule, monitor, and troubleshoot report problems from the administrators module. This also includes the ability to monitor usage and weed out unused reports. Future Access Services Its worth taking a few moments to speculate on the direction of access services so we can anticipate where future services might fit into our architecture. Authentication and authorization. Logging on to the network once will be enough to identify you to any system you want to work with. If you need to go into the financial system to check on an order status or go to the data warehouse to see a customers entire history, one logon should give you access to both. Push toward centralized services. Data access services soon will migrate either to the application server or back to the database. Three forces are driving this change. The first is the leverage the warehouse team gets by implementing one set of access services (and associated metadata) and making it available to a range of front-end tools. The second is the push that tools are getting from the Web. Vendor consolidation. There are too many front-end tool vendors for the market to support in the long run. The Web push will cause some of them to slip. Once a few clear leaders emerge, the rest will begin falling quickly. Web-based customer access. Another implication of Web access to the warehouse is that businesses might view the Web as a means of providing customers with direct access to their information, similar to the lookup services provided by express package delivery companies today. For example, a credit card company might provide significant value to its corporate customers by allowing them to analyze their employees spending patterns directly, without having to stage the data in- house. Desktop Services Only a few services actually live on the desktop, but they are arguably the most important services in the warehouse. These services are found in the front-end tools that provide users with access to the data in the warehouse. Much of the quality of the users overall experience with the warehouse will be determined by how well these tools meet their needs. To them, the rest of the warehouse is plumbingthey just want it to work Four main data access categories are push-button, standard reports, ad hoc, and data mining. Push button applications generally provide a pushbutton interface to a limited set of key reports, targeted at a specific user community. Standard reports are the approved, official view of information. They are typically fixed format, regularly scheduled reports that are delivered to a broad set of users. Ad hoc tools provide users with the ability to create their own reports from scratch. Data mining tools provide complex statistical analysis routines that can be applied to data from the warehouse. Each of these categories provides certain capabilities that meet specific business requirements.