Professional Documents
Culture Documents
1. Describe the following: SDimensional Model Ans: The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using OLAP queries. In the dimensional model, a database consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for example, the location might include the building, state, and country. A measure is a quantity describing the fact, such as revenue. It s important that measures can be meaningfully aggregated for example, the revenue from different locations can be added together. In an OLAP query, dimensions are chosen and the facts are grouped and added together to create a summary. The dimensional model is often implemented on top of the relational model using a star schema, consisting of one table containing the facts and surrounding tables containing the dimensions. Particularly complicated dimensions might be represented using multiple tables, resulting in a snowflake schema. A data warehouse can contain multiple star schemas that share dimension tables, allowing them to be used together. Coming up with a standard set of dimensions is an important part of dimensional modeling.
Page | 1
Page | 2
Page | 3
P1^P2(C)
SProjection Operation
Ans: Projection Operation 1. Like selection, projection reduces the size of relations. It is advantageous to apply projections early. Consider this form of our example query: 2. When we compute the subexpression We obtain a relation whose scheme is (cname, ccity, bname, account#, balance) 3. We can eliminate several attributes from this scheme. The only ones we need to retain are those That appear in the result of the query or are needed to process subsequent operations. 4. By eliminating unneeded attributes, we reduce the number of columns of the intermediate result, and thus its size. 5. In our example, the only attribute we need is bname (to join with branch). So we can rewrite our expression as: Note that there is no advantage in doing an early project on a relation before it is needed for some other operation: _ We would access every block for the relation to remove attributes. _ Then we access every block of the reduced-size relation when it is actually needed. _ We do more work in total, rather than less! SNatural Join Operation
Page | 4
However, deposit branch is likely to be a large relation as it contains one tuple for every account. The other part, is probably a small relation (comparatively). So, if we compute first, we get a reasonably small relation. It has one tuple for each account held by a resident of Port Chester. This temporary relation is much smaller than deposit branch. Natural join is commutative: Thus we could rewrite our relational algebra expression as: But there are no common attributes between customer and branch, so this is a Cartesian product. Lots of tuples! If a user entered this expression, we would want to use the associatively and commutatively of natural join to transform this into the more efficient expression we have derived earlier (join with deposit first, then with branch). 4. There are a number of historical, organizational, and technological reasons explain the lack of an all-encompassing data management system. Discuss few of them with appropriate examples. Ans: Most current data management systems, DMS, have been built on the assumption that the data collection, or database, to be administered consists of a single media type - structured tables of "fact" data or unstructured strings of bits representing such media objects as text documents, images, or video. The result is that most DMS' store and index a specific type of media data and provide a query (data access) language that is specialized for efficient access to and retrieval of this data type. A further assumption that has frequently been made is that the information requirements of the system users are known and can be used for structuring the data collection and tuning the data management system. It has also been assumed that the users would only infrequently require information/data from some other type of data management system. These assumptions have been criticized since the early 1980s by researchers who have pointed out that almost from the point of creation, a database would not (nor could) contain all of the data required by the user community (Gligor & Luckenbaugh, 1984; Landers & Rosenberg,1982; Litwin et al., 1982; among many others). A number of historical, organizational, and technological reasons explain the lack of an all-encompassing data management system. Among these are: The sensible advice - to build small systems with the plan to extend their scope in later implementation phases - allows a core system to be implemented relatively quickly, but has lead to a proliferation of relatively small systems. Department autonomy has led to construction of department specific rather than organization wide systems, again leading to many small, overlapping, and often incompatible systems within an organization. The continual evolution of the organization and its interactions both within and to its external environment prohibits complete understanding of future information requirements. Page | 5
A major challenge and critical practical and research problem for the information, computer, and communication technology communities is to develop data management systems that can provide efficient access to the data stored in multiple private and public databases (Brodie,1993; Hurson & Bright, 1996; Nordbotten, 1988a, 1988b and Nordbotten, 1994a). Problems to be resolved include: 1. Interoperability among systems (Fox & Sornil, 1999; Liwtin, & Abdellatif, 1986), 2. Incorporation of legacy systems (Brodie, 1993) and 3. Integration of management techniques for structured and unstructured data (Stonebraker & Brown, 1999). Each of the above problems entails an integration of concepts, methods, techniques and tools from separate research and development communities that have existed in parallel but independently and have had rather minimal interaction. One consequence of which is that there exists an overlapping and conflicting terminology between these communities In the previous chapter, a database was defined as a COLLECTION OF RELATED DATA REPRESENTING SOME LOGICALLY COHERENT ASPECT OFTHE REAL WORLD. With this definition, NO limitations are given as to the type of: Data in the collection, Model used to structure the collection, or Architecture and geographic location of the database. The focus of this text is on on-line - electronic and web accessible - databases containing multiple media data, thus restricting our interest/focus to multimedia databases stored on one or more computers (DB servers) and accessible from the Internet. Examples of such databases include the image collections of the Hermitage Museum, the catalog and full text materials of the ACM digital library, and the customer records for the 7 sites of Amazon.com. Electronic databases are important since they contain data recording the products and services, as well as the economic history and current status of the owner organization. They are also a source of information for the organization's employees and customers/users. However, databases cannot be used effectively unless there exist efficient and secure data management systems, DMS for the data in the databases 5. Describe the Structural Semantic Data Model (SSM) with relevant examples. Ans: Modeling Complex and Multimedia Data Data modeling addresses a need in information system analysis and design to develop a model of the information requirements as well as asset of viable database structure proposals. The data modelling process consists of: 1. Identifying and describing the information requirements for an information system, 2. Specifying the data to be maintained by the data management system, and 3. Specifying the data structures to be used for data storage that best support the information requirements. A fundamental tool used in this process is the data model, which is used both for specification of the Page | 6
SProposed Model
Ans: The easiest way of introducing fuzziness in the database model is to use classical relational databases and formulate a front end to it that shall allow fuzzy querying to the database. A limitation imposed on the system is that because we are not extending the database model nor are we defining a new model in any way, the underlying database model is crisp and hence the fuzziness can only be incorporated in the query. To incorporate fuzziness we introduce fuzzy sets / linguistic terms on the attribute domains / linguistic variables e.g. on the attribute domain AGE we may define fuzzy sets as YOUNG, MIDDLE and OLD. These are defined as the following:
Fig. 1: Age
For this we take the example of a student database which has a table STUDENTS with the following attributes:
Fig. 2: A snapshot of the data existing in the database
SMeta knowledge
Ans: At the level of meta knowledge we need to add only a single table, LABELS with the following structure:
Fig. 3: Meta Knowledge
Page | 7
Page | 8