Version: March 18th 1996, Author: Heinrich C. Kuhn
The number of documents on the web that contain information relevant to one or several scholarly communities is rapidly rising. This type of documents is more resemblant to articles in journals than to monographic literature. For printed scholarly documents of both types it has been necessary for a long time already to use well structured indexing information in order to permit readers to retrieve the material they are looking for. With printed documents librarians and users of libraries differentiate between several types of authorship, and the indexers of databases like e.g. Medline add controlled information for indexing. Pure names of authors and mere "author's keywords" have proven to be not sufficient, and it is to be expected, that that sort of "minimal information" won't be sufficient for scholarly documents on the web for a very long time to come. Therefore we should seek for ways to provide for scholarly documents on the web the type of structured indexing that has been proven to be necessary already in the "World of Printed Documents".
Although I think Davide Musella's proposal to be a good one, I feel it does not
(yet) permit the amount of structuring of meta-indformation, that is necessary in
some cases. Such cases are:
Therefore I here propose to permit a use of the META-tag, that permits authors to give further, and more structured information about their documents.
This could be done either by permitting named anchors as the content of
a META-item; the content of this named anchor would then be the
section of the document that contains the more structured indexing
<META NAME="IndexInfo" CONTENT="#IndexPartOfDoc"> ... Diverse content ... <a name="IndexPartOfDoc"> This here then would contain the structured indexing information: - Information about authors - Keywords and related indexing information - abstracts - miscellaneous indexing information </a> _ Diverse content _
Or this could be done by introducing a new pair of tags (<index> and
</index>) and a new boolean value for the META-tag, that says
"Yes" if the document contains such a section with structured
indexing information. E.g.:
<META NAME="IndexInfo" CONTENT="Yes"> ... Diverse content ... <index> This here then would contain the structured indexing information: - Information about authors - Keywords and related indexing information - abstracts - miscellaneous indexing information </index> _ Diverse content _
See below for further information on how this might look like.
Many scientific disciplines and larger libraries cannot rely on mere "author's keywords" to index their information in a way that permits focussed retrieval by the reades. They have therefore introduced means of classification and indexing like e.h. Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC), Medical Subjectheadings (MeSH), Controlled Keywords and the like. More than one of these ways of indexing can be used for one and the same document.
An application for an electronic document could e.g. look like this:
<index> <RSWK> "Schlagwort / Kette / Eins", "Kette / Schlagwort / Eins", "Schlagwort / Kette / Zwei", "Kette / Schlagwort / Zwei"</RSWK> <LOC-SH> Subject-Headding 1, Subject-Heading 2 <LOC-SH> <DDC>1.2.3.</DDC> <MeSH> Meshterm_1, *Meshterm_2, Meshterm_3</MeSH> <BiosisBioCode>Biocode1, *Biocode2, Biocode3, *Biocode4 </BiosisBioCode> <BiosisConceptCode>ConceptCode1, ConceptCode2 </BiosisConceptCode> <CARef> 123456, 123457, 123458 </CARef> <AuthorsKeywords> Keyword1, Keyword2, Keyword3 </AuthorsKeywords> <MPG-GV-AZ>25842, 2535 </MPG-GV-AZ> </index>
The abbreviations used for several types of classification are in many cases more or less standard and well know to the members of the scholarly community using them. For the few cases where this might not be true it should be left to the community or communities in question to decide on a way of abbreviation; in my opinion an INTERNET DRAFT is not the place where to do such a thing.
Structured information on authors permits differntiation between first and secondary authors, different types of authors, information on their institutional affilation, etc. Structured information on authors might look like this:
<index> <author-last-name>Kuhn</author-last-name> <author-first-name>Heinrich C.</author-first- name> <author-affilation>Max-Planck-Gesellschaft / Generalverwaltung, München </author-affilation> <secondary-author-last-name>Meier </secondary-author-last-name> <secondary-author-first-name>Martin </secondary-author-first-name> <secondary-author-affilation> Institut für Bibliothekswesen, Kleinkarlbach </secondary-author-affilation> <secondary-author-last-name>Müller </secondary-author-last-name> <secondary-author-first-name>Manuel </secondary-author-first-name> <secondary-author-affilation> Arbeitskreis für Bibliothekswesen, Untergiesing </secondary-author-affilation> <secondary-author-last-name>Huber </secondary-author-last-name> <secondary-author-first-name>Harald </secondary-author-first-name> <secondary-author-affilation> Kolleg für Sacherschließung, Borghorst </secondary-author-affilation> </index>
There are cases where a documment comes with more than just one abstract, or with abstracts in "unexpected" langages. For such cases the possibility to give structured information about abstracts and their contents might be wellcome.
An application of this could look like this:
<index> <abstract> <abstract-deutsch>Kurze Zusammenfassung des Dokumenten-Inhalts auf Deutsch </abstract-deutsch> <abstract-english>Short resumee of the document; in English language </abstract-englisch> </abstract> </index>
There might be cases, where other, micellaneous information might be of interest as well, like e.g. the full, long title of a baroque document made available on the net (because you would not want to put something like the title of a book by Niculaus Taurellus which runs PhilosophiaeTriumphus, hoc est Metaphysica Philosophandi Methodus, Qua Divinitus Inditis menti notitiis, humanae rationes eo deducuntur, ut firmissimis unde contructis demonstrationibus, aperte rei veritas elucescat, & quae diu Philosophorum sepulta fuit authoritate, Philosophia victrix erumpat: Quaetionibus enim vel sexcentis, ea quibus cum revelato nobis veritate Philosophia pugnare videtur, adeo vere conciliantur, ut non fidei solum servire dicenda sit, sed eius esse fundamentum between <h1> and </h1> on the user's screen ...).
An application might look like this:
<index> <full-title> Very long title of the document, which is a type of title very much en vogue in the late renaissance and in the baroque era, but which can be found in the case contemporary German doctoral theses up to the present days. Diplaying a type of title, that often tends to make use of subtitles as well </full-title> <date-creation>19951222</date-creation> <date-update>19960226</date-update> <technical-info> HTML-Dokument with search-interface to Database and Links to ftp-resources </technical-info> </index>
User Agents (SearchEngines, programs collecting information gathered by
various robots, crawlers, and the like) could use structered indexing information
in the following way:
Somebody trying to find out some scholarly indexed information in the biomedical field is out there on the net then would just have to ask the SearchEngine used by her or him for all documents having certain contents within the indexing fields for MeSH and the relevant BiosisBioCodes.
This type of information worked well for printed documents in cases where "author's keywords" alone did not work. It is to be hoped, that it will work in the electronic context as well.
The main difference here is, that in the electronic context in many cases it will be up to the authors to index their documents themselves, while in the case of printed documents such indexing was done by experts in such indexing.
As most authors know rather well the types and terms of indexing relevant to their narrower field of research, and as good indexing helps the authors to find the desired readers for their documents, it is to be hoped, that relying on the authors to do (at least part of) the indexing is possible.
Comments and critique are always wellcome by Heinrich C. Kuhn . Thanks a lot in advance!