|
Lädt ... An Introduction to Data Base Systems: v. 1 (1975)472 | 4 | 52,300 |
(3.82) | Keine | A comprehensive, up-to-date treatment of database technology. This edition features: updated coverage of object-oriented database systems, including a proposal for rapprochement between OO and relational technologies; expanded treatment of distributed databases, including client/server architectures, with an emphasis on database design issues; a comprehensive introduction to all aspects of the relational model - the basis of modern database technology; and new chapters on functional dependencies, views, domains and missing information.… (mehr) |
▾Empfehlungen von LibraryThing ▾Diskussionen (Über Links) Keine aktuelle Diskussion zu diesem Buch. ▾Reihen und Werk-Beziehungen ▾Auszeichnungen und Ehrungen
|
Gebräuchlichster Titel |
|
Originaltitel |
|
Alternative Titel |
|
Ursprüngliches Erscheinungsdatum |
|
Figuren/Charaktere |
Die Informationen stammen von der englischen "Wissenswertes"-Seite. Ändern, um den Eintrag der eigenen Sprache anzupassen. | |
|
Wichtige Schauplätze |
|
Wichtige Ereignisse |
|
Zugehörige Filme |
|
Epigraph (Motto/Zitat) |
|
Widmung |
|
Erste Worte |
|
Zitate |
Die Informationen stammen von der englischen "Wissenswertes"-Seite. Ändern, um den Eintrag der eigenen Sprache anzupassen. ■ Types are (sets of) things we can talk about. ■ Relations are (sets of) things we say about the things we can talk about.
(There is a nice analogy here that might help you appreciate and remember these important points: Types are to relations as nouns are to sentences.) Thus, in the example, the things we can talk about are employee numbers, names, department numbers, and money values, and the things we say are true utterances of the form “The employee with the specified employee number has the specified name, works in the specified department, and earns the specified salary.”
It follows from all of the foregoing that: 1. Types and relations are both necessary (without types, we have nothing to talk about; without relations, we cannot say anything). 2. Types and relations are sufficient, as well as necessary—i.e., we do not need anything else, logically speaking. 3. Types and relations are not the same thing. It is an unfortunate fact that certain commercial products—not relational ones, by definition!—are confused over this very point.” ...SQL is very far from being the “perfect” relational language—it suffers from numerous sins of both omission and commission. ...the overriding issue is simply that SQL fails in all too many ways to support the relational model properly. As a consequence, it is not at all clear that today's SQL products really deserve to be called “relational” at all! Indeed, as far as this writer is aware, there is no product on the market today that supports the relational model in its entirety. This is not to say that some parts of the model are unimportant; on the contrary, every detail of the model is important, and important, moreover, for genuinely practical reasons. Indeed, the point cannot be stressed too strongly that the purpose of relational theory is not just “theory for its own sake”; rather, the purpose is to provide a base on which to build systems that are 100 percent practical. But the sad fact is that the vendors have not yet really stepped up to the challenge of implementing the theory in its entirety. As a consequence, the “relational” products of today regrettably all fail, in one way or another, to deliver on the full promise of relational technology. ...since there is so much confusion surrounding it in the industry. You will often hear claims to the effect that relational attributes can only be of very simple types (numbers, strings, and so forth). The truth is, however, that there is absolutely nothing in the relational model to support such claims. ...in fact, types can be as simple or as complex as we like, and so we can have attributes whose values are numbers, or strings, or dates, or times, or audio recordings, or maps, or video recordings, or geometric points (etc.).
The foregoing message is so important‒and so widely misunderstood‒that we state it again in different terms:
The question of what data types are supported is orthogonal to the question of support for the relational model. ...note that relational systems require only that the database be perceived by the user as tables. Tables are the logical structure in a relational system, not the physical structure. At the physical level, in fact, the system is free to store the data any way it likes—using sequential files, indexing, hashing, pointer chains, compression, and so on—provided only that it can map that stored representation to tables at the logical level. Another way of saying the same thing is that tables represent an abstraction of the way the data is physically stored—an abstraction in which numerous storage level details (such as stored record placement, stored record sequence, stored data value representations, stored record prefixes, stored access structures such as indexes, and so forth) are all hidden from the user.
... The Information Principle: The entire information content of the database is represented in one and only one way—namely, as explicit values in column positions in rows in tables. This method of representation is the only method available (at the logical level, that is) in a relational system. In particular, there are no pointers connecting one table to another. The Golden Rule:
No update operation must ever assign to any database a value that causes its database predicate to evaluate to FALSE. (a) the heading of any given relvar [(relation value)] can be regarded as a predicate, and (b) the tuples [(rows)] appearing in that relvar at any given time can be regarded as true propositions, obtained from the predicate by substituting arguments of the appropriate type for the parameters of that predicate (“instantiating the predicate”). We can say that the predicate corresponding to a given relvar is the intended interpretation, or meaning, for that relvar, and the propositions corresponding to tuples of that relvar are understood by convention to be true ones. In fact, the Closed World Assumption (also known as the Closed World Interpretation) says that if an otherwise valid tuple—that is, one that conforms to the relvar heading—does not appear in the body of the relvar, then we can assume the corresponding proposition is false. In other words, the body of the relvar at any given time contains all and only the tuples that correspond to true propositions at that time. …the external predicate for a given relvar [(relation value)] is the intended interpretation for that relvar. As such, it is important to the user, but not to the system. We can also say, again informally, that the external predicate for a given relvar is the criterion for acceptability of updates on the relvar in question—that is, it dictates, at least in principle, whether a requested INSERT, DELETE, or UPDATE operation on that relvar can be allowed to succeed. Ideally, therefore, the system would know the external predicate for every relvar, so that it could deal correctly with all possible attempts to update that relvar. As we have seen, however, this goal is unachievable; the system cannot know the external predicate for any given relvar. But it does know a good approximation: It knows the corresponding internal predicate—and that is what it will enforce. Thus, the pragmatic “criterion for acceptability of updates” (as opposed to the ideal one) is the internal predicate, not the external one. Another way of saying the same thing is as follows: The system cannot enforce truth, only consistency.That is, the system cannot guarantee that the database contains only true propositions—all it can do is guarantee that it does not contain anything that causes any integrity constraint to be violated (i.e., it does not contain any inconsistencies). Sadly, truth and consistency are not the same thing! Indeed, we can observe that: - If the database contains only true propositions, then it is consistent, but the converse is not necessarily so.
- If the database is inconsistent, then it contains at least one false proposition, but the converse is not necessarily so.
More succinctly: Correct implies consistent (but not the other way around), and inconsistent implies incorrect (but not the other way around) —where by correct we mean the database is correct if and only if it fully reflects the true state of affairs in the real world. It follows that there must be no arbitrary and unnecessary distinctions between base and derived relvars [(relation value)]. We refer to this fact as The Principle of Interchangeability (of base and derived relvars). Note in particular that this principle implies that we must be able to update views—the updatability of the database must not depend on the essentially arbitrary decision as to which relvar’s we decide should be base ones and which we decide should be views. …
Let us agree for the moment to refer to the set of all base relvars as “the real database.” But a typical user interacts (in general) not with that real database per se but with what might be called an “expressible” database, consisting (again in general) of some mixture of base relvars and views. Now, we can assume that none of the relvars in that expressible database can be derived from the rest, because such a relvar could be dropped without loss of information. Hence, from the user’s point of view, those relvars are all base relvars, by definition! Certainly they are all independent of one another (i.e., all autonomous…). And likewise for the database itself—that is, the choice of which database is the “real” one is arbitrary too, just as long as the choices are all information-equivalent. We refer to this fact as The Principle of Database Relativity. …many of today’s products do unfortunately include certain optimization inhibitors, which users should at least be aware of (even though there is little they can do about them, in most cases). An optimization inhibitor is a feature of the system in question that prevents the optimizer from doing as good a job as it might do otherwise (i.e., in the absence of that feature). The inhibitors in question include duplicate rows…three-valued logic…and SQL’s implementation of three-valued logic… - You will appreciate that we have merely scratched the surface of the problems that can arise from nulls and 3VL [three-valued-logic]. However, we have tried to cover enough ground to make it clear that the “benefits” of the 3VL approach are more than a little doubtful.
- We should also make it clear that, even if you are not convinced regarding the problems of 3VL per se, it would still be advisable to avoid the corresponding features of SQL, because of the additional flaws already mentioned.
- Our recommendation to DBMS users would thus be to ignore the vendor’s 3VL support entirely, and to use a disciplined special values scheme instead (thereby staying firmly in two-valued logic)….
- Finally, we repeat the following fundamental point…: If—speaking very loosely—the value of a given attribute within a given tuple within a given relation “is null,” then that attribute position in fact contains nothing at all . . . which implies that the “attribute” is not an attribute, the “tuple” is not a tuple, the “relation” is not a relation, and the foundation for what we are doing (whatever else it might be) is no longer mathematical relation theory.
IS A CIRCLE AN ELLIPSE?…How can we resolve this dilemma? The way out is—as so often—to recognize and act upon the fact that there is a major logical difference between values and variables. When we say “every circle is an ellipse,” what we mean, more precisely, is that every circle value is an ellipse value. We certainly do not mean that every circle variable is an ellipse variable (a variable of declared type CIRCLE is not a variable of declared type ELLIPSE, and it cannot contain a value of most specific type ELLIPSE). In other words, inheritance applies to values, not variables. In the case of ellipses and circles, for example: - As just noted, every circle value is an ellipse value.
- Therefore, all operations that apply to ellipse values apply to circle values too.
- But the one thing we cannot do to any value is change it!—if we could, it would be that value no longer. (Of course, we can “change the current value of” a variable, by updating that variable, but—to repeat—we cannot change the value as such.)
Now, the operations that apply to ellipse values are precisely all of the read-only operators defined for type ELLIPSE, while the operations that update ELLIPSE variables are, of course, the update operators defined for that type. Hence, our dictum that “inheritance applies to values, not variables” can be stated more precisely as follows: - Read-only operators are inherited by values, and hence a fortiori by current values of variables (since read-only operators can obviously be applied—harmlessly—to those values that happen to be the current values of variables).
This more precise statement also serves to explain why the concepts of polymorphism and substitutability refer very specifically to values, not variables. For example (and just to remind you), substitutability says that wherever the system expects a value of type T, we can always substitute a value of type T' instead, where T' is a subtype of T (boldface added for emphasis). In fact, we specifically referred to this principle, when we first introduced it, as The Principle of Value Substitutability (again, note the emphasis). What about update operators, then? By definition, such operators apply to variables, not values. So can we say that update operators that apply to variables of type ELLIPSE are inherited by variables of type CIRCLE?… Of course, if an update operator is inherited, then we do have a kind of polymorphism and a kind of substitutability that apply to variables instead of values. For example, the update version of MOVE expects an argument that is a variable of declared type ELLIPSE, but we can invoke it with an argument that is a variable of declared type CIRCLE instead (though not with an argument that is a variable of declared type O_CIRCLE!). Thus, we can (and do) talk sensibly about another principle, The Principle of Variable Substitutability—but that principle is more restrictive than The Principle of Value Substitutability discussed previously. So What Do Subtypes Really Mean?
…There is no way to obtain a colored circle from a circle via specialization by constraint.…
Again, therefore, it seems much more reasonable to regard type COLORED_CIRCLE and CIRCLE as completely different types, and to regard type COLORED_CIRCLE in particular as having a possible representation in which one component is of type CIRCLE and another is of type COLOR…
In fact, we are touching here on a much larger issue. The fact is, we believe that subtyping should always be via specialization by constraint! That is, we suggest that if T' is a subtype of T, there should always be a type constraint such that, if it is satisfied by some given value of type T, then the value in question is really a value of type T' (and should automatically be specialized to type T'). …
Thus, we claim that specialization by constraint is the only conceptually valid means of defining subtypes. As a consequence, we reject examples like the one suggesting that COLORED_CIRCLE might be a subtype of CIRCLE. Despite these potential advantages, however, we now observe that there does not seem to be any consensus on a formal, rigorous, and abstract model of type inheritance.
To quote reference [Andrew Taivalsaari: “On the Notion of Inheritance,” ACM Comp. Surv. 28, No. 3 (September 1996)]:
The basic idea of inheritance is quite simple … [and yet, despite] its central role in current ... systems, inheritance is still quite a controversial mechanism ... [A] comprehensive view of inheritance is still missing. Now we can state what might be regarded as the fundamental principle of distributed database…: To the user, a distributed system should look exactly like a nondistributed system.In other words, users in a distributed system should be able to behave exactly as if the system were not distributed. All of the problems of distributed systems are—or should be—internal or implementation-level problems, not external or user-level problems.… The foregoing fundamental principle leads to certain subsidiary rules or objectives, twelve in number…: - Local autonomy
- No reliance on a central site
- Continuous operation
- Location independence
- Fragmentation independence
- Replication independence
- Distributed query processing
- Distributed transaction management
- Hardware independence
- Operating system independence
- Network independence
- DBMS independence
Common Design ErrorsIn this subsection, we comment briefly on some design practices that are common in the decision support environment and yet we feel are not a good idea. - Duplicate rows: Decision support designers often claim that their data simply has no unique identifier and that they therefore have to permit duplicates. … here we just remark that the “requirement” typically arises because the physical schema is not derived from a logical schema (which was probably never created in the first place). We note too that in such a design the rows often have nonuniform meanings (especially if any nulls are present)—that is, they are not all instantiations of the same predicate. … Note: Duplicates are sometimes even regarded as a positive feature, especially if the designer has an object-oriented background.
- Denormalization and related practices: In a misguided effort to eliminate joins and reduce I/O, designers often prejoin tables, introduce derived columns of various kinds, and so on. Such practices might be acceptable at the physical level, but not if they are detectable at the logical level.
- Star schemas: “Star schemas” (also known as dimensional schemas) are most often the result of an attempt to short-circuit proper design technique. There is little to be gained from such shortcuts. Often both performance and flexibility suffer as the data-base grows, and resolving such difficulties via physical redesign forces changes in applications as well (because star schemas are really physical schemas, even though they are exposed to applications). The overall problem lies in the ad hoc nature of the design…
- Nulls: Designers often attempt to save space by permitting nulls in columns (this trick might work if the column in question is of some variable-length data type and the product in question represents nulls in such columns by empty strings at the physical level). Such attempts are generally a mistake, however. Not only is it possible (and desirable) to design in such a way as to avoid nulls in the first place, but the resulting schemas often provide better storage efficiency and better I/O performance.
- Design of summary tables: The question of logical design for summary tables is often ignored, leading to uncontrolled redundancy and difficulties in maintaining consistency. As a consequence, users can become confused as to the meaning of summary data and how to formulate queries involving it. To avoid such problems, all summary tables “at the same level of aggregation” … should be designed as if they constituted a database in their own right. Certain cyclic update problems can then be avoided by (a) prohibiting updates from spanning aggregation levels and (b) synchronizing the summary tables by always aggregating from the detail level up.
- “Multiple navigation paths”: Decision support designers and users often speak, incorrectly, of there being a “multiplicity of navigational paths” to some desired data, meaning the same data can be reached via several different relational expressions. …
It is clear that users can become confused in such cases and be unsure as to which expression to use and whether or not there will be any difference in the result. Part of this problem can only be solved by proper user education, of course. Another part can be solved if the optimizer does its job properly. However, yet another part is due to designers allowing redundancies in the logical schema and/or letting users access the physical schema directly, and that part of the problem can only be solved by proper design practice. In sum, we believe that many of the design difficulties allegedly arising from decision support requirements can be addressed by following a disciplined approach. Indeed, many of those difficulties are caused by not following such an approach (though it is only fair to add that they are often aggravated by problems with SQL). We begin with a quote from The Third Manifesto…: [Before] we can consider the question of [a rapprochement between] objects and relations in any detail, there is a crucial preliminary question that we need to address, and that is as follows: What concept is it in the relational world that is the counterpart to the concept object class in the object world?The reason this question is so crucial is that object class really is the most fundamental concept of all in the object world—all other object concepts depend on it to a greater or lesser degree. And there are two equations that can be, and have been, proposed as answers to this question: - domain = object class
- relvar [(relation value)] = object class
We now proceed to argue, strongly, that the first of these equations is right and the second is wrong. In fact, the first equation is obviously right, since object classes and domains are both just types. Indeed, given that relvars are variables and classes are types, it should be immediately obvious too that the second equation is wrong (variables and types are not the same thing); for this very reason, … relvars are not domains. Nevertheless, many people, and some products, have in fact embraced the second equation—a mistake that we refer to as The First Great Blunder. In this section, we examine The Second Great Blunder; as we will see, that second blunder is a logical consequence of the first, but it is also significant in its own right. In fact, it can be committed in its own right, too, even if The First Great Blunder is avoided; indeed, it is being committed by just about every object/relational product on the market, as well as by the SQL standard … The blunder consists of mixing pointers and relations…
The crux of our argument here is very simple. By definition, pointers point to variables, not values (because variables have addresses and values do not). By definition, therefore, if relvar [(relation value)] R1 is allowed to have an attribute whose values are pointers “into” relvar R2, then those pointers point to tuple variables, not to tuple values. But there is no notion of a tuple variable in the relational model. The relational model deals with relation values, which are (loosely speaking) sets of tuple values, which are in turn (again loosely speaking) sets of scalar values. It also deals with relation variables, which are variables whose values are relations. However, it does not deal with tuple variables (which are variables whose values are tuples) or scalar variables (which are variables whose values are scalars). The only kind of variable included in the relational model—and the only kind of variable permitted in a relational database—is, very specifically, the relation variable. It follows that the idea of mixing pointers and relations constitutes a MAJOR departure from the relational model, introducing as it does an entirely new kind of variable (thereby violating The Information Principle, in fact). … in fact, we would argue that it seriously undermines the conceptual integrity of the relational model.
Given the truth of the foregoing, it is sad to see that most (perhaps all) of the current crop of object/relational products—even those that do avoid The First Great Blunder—nevertheless seem to be mixing pointers and relations in exactly the manner discussed, and objected to, in the previous section. When Codd first defined the relational model, he very deliberately excluded pointers. To quote…:
It is safe to assume that all kinds of users [including end users in particular] understand the act of comparing values, but that relatively few understand the complexities of pointers. The relational model is based on this fundamental principle. ... [The] manipulation of pointers is more bug-prone than is the act of comparing values, even if the user happens to understand the complexities of pointers.
To be specific, pointers lead to pointer chasing, and pointer chasing is notoriously error-prone. … it is precisely this aspect of object systems that gives rise to the criticisms, sometimes heard, to the effect that such systems “look like CODASYL warmed over.” It is hard to find any real justification in the literature for The Second Great Blunder (any technical justification, that is—but there is evidence that the justification is not technical at all but political). Given the fact that object systems and object languages do all include pointers in the form of object IDs. the idea of mixing pointers and relations almost certainly arises from a desire to make relational systems more “object-like,” However, this “justification” merely pushes the problem off to another level; we have already made it abundantly clear that—in our opinion—object systems expose pointers to the user precisely because they fail to distinguish properly between model and implementation.
We can only conjecture, therefore, that the reason why the idea of mixing pointers and relations is being so widely promulgated is because too few people understand why pointers were excluded from relations in the first place. As Santayana has it: Those who cannot remember the past are condemned to repeat it (usually quoted in the form “Those who don't know history are doomed to repeat it”). On such matters we agree strongly with Maurice Wilkes, when he writes:
I would like to see computer science teaching set deliberately in a historical framework...Students need to understand how the present situation has come about, what was tried, what worked and what did not, and how improvements in hardware made progress possible. The absence of this element in their training causes people to approach every problem from first principles. They are apt to propose solutions that have been found wanting in the past. Instead of standing on the shoulders of their precursors, they try to go it alone. XML Databases
After all,…the relational model is both necessary and sufficient to represent any data whatsoever. We also know there is a huge investment in terms of research, development, and commercial products in what might be called relational infrastructure (i.e., support for recovery, concurrency, security, and optimization-not to mention integrity!—and all of the other topics we have been discussing in this book). In our opinion. therefore, it would be unwise to embark on the development of a totally new kind of database technology when there does not seem to be any overwhelming reason to do so—not to mention the fact that any such technology would obviously suffer from problems similar to those that hierarchic database technology already suffers from…. Jon Bosak and Tim Bray: “XML and the Second-Generation Web.” http://www.sciam.com (May 1999).
This paper includes an excellent, albeit presumably unintentional, argument for not using XML as the basis for a new database technology. To quote: “XML [documents have] the structure known in computer science as a tree . . . Trees cannot represent every kind of information, but they can represent most kinds that we need computers to understand. Trees, moreover, are extraordinarily convenient for programmers [sic]. If your bank statement is in the form of a tree, it is a simple matter to write a bit of software that will reorder the transactions or display just the cleared checks.” Well, yes; these remarks might be accurate as far as they go; but do they go far enough? A study of the history of trees (in other words, hierarchies) in the database context strongly suggests that the answer to this question is no. The fundamental point is that even when the data has a naturally hierarchic structure—as (it might be argued) is the case with, for example, departments and employees—it does not follow that it should be represented hierarchically, because the hierarchic representation is not suitable for all of the processing we might want to do on the data. And what about data that does not have a “naturally hierarchic” structure? For example, what is the best tree representation for propositions of the form “Supplier s supplies part p to project j”? Note: We raised these same objections…in connection with objects, which are also hierarchic, like XML documents. In the field of scientific endeavor, an idea emerges from time to time that is so startlingly novel, and so dramatically better than anything that went before, that it can truly be described as a breakthrough. The relational model provides the obvious example in the database world; almost everything in this book stands as testament to the radical nature and impact of that one brilliant idea. And now we are witnessing the birth of what looks set to be another major breakthrough: The TransRelational Model. In this writer’s opinion, the TransRelational Model—invented by Steve Tarin, and abbreviated hereinafter to just TR—is likely to prove the most significant development in this field since Codd gave us the relational model, nearly 35 years ago.
We should say immediately that TR is not intended as a replacement for the relational model; the “trans” in “transrelational” does not stand for beyond as it does in (e.g.) “translunar,” it stands for transform. It is true that TR and the relational model are both abstract models of data, but TR is at a lower level of abstraction (i.e., it is closer to physical storage); in fact, TR is designed to serve among other things as an implementation vehicle for the relational model. As you might recall, we said...that “a radically new approach to DBMS implementation has [recently] emerged, an approach that has the effect of invalidating many of the assumptions underlying” conventional approaches to implementation. TR is that new approach. The crucial insight underlying the TR [TransRelational] model can be characterized as follows. Let r be a record within some file at the file level. Then:
The stored form of r involves two logically distinct pieces, a set of field values and a set of “linkage” information that ties those field values together, and there is a wide range of possibilities for physically storing each piece.
In direct-image systems, the two pieces are stored together; in other words, the linkage information in such systems is represented by physical contiguity. In TR, by contrast, the two pieces are kept separate—the field values are kept in the Field Values Table, and the linkage information is kept in the Record Reconstruction Table. And it is that separation that is the fundamental source of the numerous benefits that TR is able to provide. | |
|
Letzte Worte |
|
Hinweis zur Identitätsklärung |
|
Verlagslektoren |
|
Werbezitate von |
|
Originalsprache |
|
Anerkannter DDC/MDS |
|
Anerkannter LCC |
|
▾Literaturhinweise Literaturhinweise zu diesem Werk aus externen Quellen. Wikipedia auf Englisch (11)▾Buchbeschreibungen A comprehensive, up-to-date treatment of database technology. This edition features: updated coverage of object-oriented database systems, including a proposal for rapprochement between OO and relational technologies; expanded treatment of distributed databases, including client/server architectures, with an emphasis on database design issues; a comprehensive introduction to all aspects of the relational model - the basis of modern database technology; and new chapters on functional dependencies, views, domains and missing information. ▾Bibliotheksbeschreibungen Keine Bibliotheksbeschreibungen gefunden. ▾Beschreibung von LibraryThing-Mitgliedern
Zusammenfassung in Haiku-Form |
|
|
Aktuelle DiskussionenKeineGoogle Books — Lädt ... Tausch (2 vorhanden, 6 gewünscht)
|
Biblen indenfor databaser. Glimrende gennemgang med fornuftige eksempler og teoretisk baggrund ( )