|
Services | Tips | Theory | WebPoems | Workshops | Books | Articles | Lisa | Jonathan |
Our Articles ... |
On Complex Documentation |
On
Personalizing Content
On Complex Documentation |
Adding
Semantics to SGML DatabasesBy Subhasish
Mazumdar, Weifeng Bao, Zhengang Yuan, and Jonathan Price
New Mexico Institute of Mining and Technology Presented at Electronic Publishing 98 Conference, Saint Malo, France (April 1-4, 1998), and included in the proceedings volume, in Hersch, R., Andre, J., & Brown, H. (Eds.), Electronic Publishing, Artistic Imaging, and Digital Typography, Vol. 1375 in the Lecture Notes in Computer Science Series, Springer-Verlag, Berlin, 1998. 563-574.Digest (Original fulltext encumbered with copyright by Springer-Verlag). Technical writers who must maintain complex, delicately interconnected information often look to object-oriented SGML databases as a way of storing, retrieving, reusing, and reassembling the constituent objects of new documents, created on the fly to respond to a particular customers needs. The SGML tags help identify structural packages such as procedures, illustrations, or glossary items; in a large database, then, writers can filter out unwanted material, locating only the structural pieces they need for the job in hand. For instance, to produce a quick reference, a writer might pull up the names of procedures and their steps, but not the introductions or explanations. Similarly, a user could search for illustrations only. But illustrations of what? With no subject matter defined, such searches result in hundreds, even tens of thousands of hits. To speed up access to the precise passages wanted, end users and writers need a way to narrow their searches by defining the precise subject matter (the meaning, or semantics) as well as the structural elements they seek. We recommend using an attribute called Subject Matter for every object class. We suggest that whatever values we assign to Subject attribute of the document as a whole should trickle down to every object within it, and that a writer should enter additional values in the subject attribute for each chapter, values which would then apply to every object within that chapter. No one wants to have to fill out a form identifying the subject matter of every paragraph. But occasionally a paragraph strays so far from the main topic of the chapter that a user would never discover it without the writer adding a few more values to that paragraphs Subject attribute. Unfortunately, whenever writers are asked to create keywords to passages, the results are, at best, uneven. Many writers just echo the title of the page or section; few add synonyms; very few attempt to rethink the material from the point of view of a newcomer. The ability to pour values down from the larger package, such as a document or chapter, to all the objects within offers us some increase in consistency; and the ability to add values as we descend to lower level objects gives us more precision. If the organization has adopted an enterprise data model as part of an effort to re-engineer its processes, technical writers should be encouraged to draw on the concepts in that schema as values in the Subject field, so that the terms in the documentation match those in the business. In this way, a visitor may ask to look at the diagram of the enterprise data model as a way to locate the "correct" term for a subject, then request that a search of the database of documentation for that subject, in all structural elements, or just a few. A writer may also want to create a thesaurus entry for major topics, entering synonyms or related terms for as additional subject values. In these ways, authors can add semantic information to speed up access to the particular chunks they, or a user, might need. But what happens when change occurs? Here are some examples of the kind of transformations that must be dealt with, in this approach.
Running nightly checks on the pointers, and creating an error table, the system administrator must make sure that anyone using the current enterprise data model can locate the information, and anyone who recalls earlier schemas can do so as well. If an entity is scratched from the enterprise data model, but has never referred to anything mentioned in the documentation database, it can be ignored. If every entity mentioned in a document has been eliminated from the enterprise data model, the administrator should consider archiving the document offline. The downside of our approach, then, is that involves regular maintenance. But we argue that increasingly the individual writer will be responsible for constant maintenance, and will be able to perform such routines after having been given instruction by the system administrator. The benefits of our approach are many:
But why not just do a fulltext search? In building their indexes, most full text searches fail to locate passages that happen not to mention the word being queried; ignore attribute values; know nothing of the enterprise data model; and provide, at best, a thousand points of light, rather than an overall picture, lit up with individual topics. The most powerful filter is the human mind, and our approach enables writers to provide meaningful access to their work, through the semantics of values in the attributes of each object in the database.
|
Services/Tips/Theory/WebPoems/Workshops/Books/Articles/Lisa/Jonathan
Copyright 1998-2001 Jonathan and Lisa Price, The Communication Circle
Return to our site at http://www.theprices.com/circle
Email us at JonPrice@AOL.com