diff options
Diffstat (limited to 'doc/context/sources/general/manuals/publications/publications-database.tex')
-rw-r--r-- | doc/context/sources/general/manuals/publications/publications-database.tex | 553 |
1 files changed, 553 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/publications/publications-database.tex b/doc/context/sources/general/manuals/publications/publications-database.tex new file mode 100644 index 000000000..656ace56d --- /dev/null +++ b/doc/context/sources/general/manuals/publications/publications-database.tex @@ -0,0 +1,553 @@ +\environment publications-style + +\startcomponent publications-database + +\startchapter[title=The database] + +The bibliography subsystem uses a database (or a set of databases) to construct a +list of citations to be used in a scholarly work. However, it will be shown later +that the database system can be used (and abused) to many ends having little or +nothing at all to do with citations and bibliographies. Nevertheless, at first we +shall remain focused on the use of bibliography databases. + +The data to be used must have a source and a structure. In the next sections we +describe the possible input. + +\startsection[title=\BibTeX] + +The \BIBTEX\ format is rather popular in the \TEX\ community and even with its +shortcomings it will stay around for a while. Many publication websites can +export and many tools are available to work with this database format. It is +rather simple and looks a bit like \index [LUA table] {\LUA\ table}\LUA\ tables. +Indeed, it is said that the \BIBTEX\ format was one of the inspirations for the +constructor syntax in \LUA\ \cite [alternative=num, +righttext={\btxcomma Chapter\nbsp 12.}] [default::Ierusalimschy2006]. + +Unfortunately the content can be (and usually is) polluted with +non|-|standardized \TEX\ commands which complicates pre- or post|-|processing +outside \TEX. In that sense a \BIBTEX\ database is often not coded neutrally. +Some limitations, like the use of commands to encode accented characters root in +the \ASCII\ world and can be bypassed by using \index [UTF] {\UTF}\UTF\ instead +(as handled somewhat in \LATEX\ through extensions such as \Tindex {bibtex8}). + +The normal way to deal with a bibliography is to refer to entries using a unique +\Index {tag} or key. When a text containing a list of entries is typeset, this +reference can be used for linking purposes. The list can be processed and sorted +using the \Tindex {bibtex} program that converts the database into something more +\TEX\ friendly (a \Tindex {.bbl} file). + +In \CONTEXT\ we no longer use the (external) \goto {\Tindex {bibtex} program} +[url(https://www.ctan.org/pkg/bibtex)] at all: we simply parse the database files +in \LUA\ and deal with the necessary manipulations directly in \CONTEXT. One or +more such databases can be used and combined with additional entries defined +within the document. We can have several such datasets active at the same time. + +\startaside +\emphasis {On the name \Tindex {btx}:} many of the \CONTEXT\ commands that will be +used in the following contain the label \TEXcode {btx} in their name. This +identifier was retained despite the fact that \CONTEXT\ \MKIV\ is now completely +independent of \BIBTEX; it reflects the role still played by \BIBTEX\ data as a +preferred source format and serves as a handy, unique identifier, both internally +in the programming as well as for the user. This three|-|letter label is +systematically used in commands that otherwise attempt to avoid cryptic|-|styled +names. +\stopaside + +A \BIBTEX\ file entry looks like this: + +\startBTX +@Article {sometag, + author = "An Author and Another One", + title = "A hopefully meaningful title", + journal = maps, + volume = "25", + number = "2", + pages = "5--9", + month = mar, + year = "2013", + ISSN = "1234-5678", +} +\stopBTX + +Entries are of the form: \index {category}\BTXcode {@category{...}} + +Anything outside of a valid \BTXcode {@category{...}} construction is ignored and +is taken to be a comment. Within an entry, there are to be no comments but one +can prefix field names, for example, to have them ignored. + +There is a special entry type named \index {@comment}\BTXcode {@comment{...}}. +The main use of such an entry type is to comment a large part of the bibliography +easily, since anything outside an entry is already a comment, and commenting out +one entry may be achieved by just removing its initial~\BTXcode {@}. — The \index +{@comment}\BTXcode {@comment{...}} entry is perhaps of some use, although this is +not very elegant! As one can input multiple bibliography data files, as will be +seen below, it is much better practice to split datafiles for optional loading. + +Many \BIBTEX\ data management tools such as \Tindex {jabref} (see below) will +ignore and then throw|-|away all such handily|-|crafted comments and data entries +turned into comments. So one must beware! + +The field names are all cast to lowercase so capitalization is irrelevant; +Spacing is not important and should be used advantageously for readability. The +leading \Index {tag} (\BTXcode {sometag} in the example above) cannot contain +spaces and \emphasis {must} be followed by a comma. + +The entry \Index {tag} (\BTXcode {@category{sometag,...}}) is not to be confused +with the optional field \BTXcode {key=sortkey,} that may also be present. + +Normally a value is given between quotes (or curly brackets) but single words are +also valid (as there is no real benefit in not using quotes or curly brackets, we +advise to always use them, contrary to our example above). The order of the +fields in an entry is inconsequential and there can be many more fields than +those shown above. Instead of string values one can also use predefined +shortcuts. The title for example might quite often contain \TEX\ macros, and some +fields, like \BTXcode {pages} have funny characters such as the endash (typically +entered as \BTXcode {--}) so we have a mixture of data and typesetting +directives. Furthermore, if you are covering non||English references, you often +need characters that are not in the \ASCII\ subset. Note that \CONTEXT\ is quite +happy with \UTF, but if your database file uses old|-|fashioned \TEX\ accent +combinations then these will be internally converted automatically to \UTF. + +Commands (macros) found in a database file are converted to an indirect call, +which is quite robust. The use of commands in the database file will be described +in \in {section} [sec:Commands]. + +The \Tindex {author} (and \Tindex {editor}) fields are parsed separating multiple +authors identified by the conjunction \quote {and}. Each name is assumed to be in +the form: + +\definetyping + [NameSyntax] + [margin=1em] + +\startNameSyntax +Firstname(s) Lastname +\stopNameSyntax + +\seeindex {vons} {particule} + +where \type {Lastname} is a single word but may include an optional (nobility) +\Index {particule}: lower|-|case word(s) such as \quotation {von}, \quotation +{de}, \quotation {de la}, etc.) \emphasis {unless} specifically in the two- or +three|-|token form: + +\index {suffix} + +\startNameSyntax +Lastname(s), Firstname(s) +Lastnames(s), Suffix(es), Firstname(s) +\stopNameSyntax + +separated explicitly using comma(s) thus allowing multi|-|word \type {Lastnames}. + +\startaside +An \BTXcode {author} field is sometimes abused in traditional \BIBTEX\ usage to +hold not a name but rather an entity. Other fields, such as \BTXcode +{organization} or \BTXcode {collaboration}, for example, should be used in such +cases. +\stopaside + +\BIBTEX\ also (obscurely) supports the syntax: + +\seeindex {juniors}{suffix} +\index {suffix} + +\startNameSyntax +Firstname(s) \{Lastname(s), Suffix(es)\} +\stopNameSyntax + +we may (or may not) support this in the future, so don't use this! + +We extend \BIBTEX\ by optionally parsing each name in terms of four or five +tokens: + +\index {particule} \index {suffix} \index {initial} + +\startNameSyntax +Particule(s), Lastname(s), Suffix(es), Firstname(s) +Particule(s), Lastname(s), Suffix(es), Firstname(s), Initial(s) +\stopNameSyntax + +in order to allow a free form for the particules, irrespective of capitalization, +thus avoiding the need to resort to any sort of \TEX\ trickery \cite [num] +[default::Patashnik1988,Markey2009]. In fact, an optional sixth token is parsed +whose meaning is presently reserved for future directives describing how the name +is to be interpreted: + +\index {particule} \index {suffix} \index {initial} + +\startNameSyntax +Particule(s), Lastname(s), Suffix(es), Firstname(s), Initial(s), directives +\stopNameSyntax + +\BIBTEX\ additionally accepts the special token \Tindex {others} to be used +(sparingly) to indicate an incomplete author list. Note that most style +specifications will handle the truncation of long author lists in a systematic +fashion. The \index [others] {\tt and others}\BTXcode {and others} construction +finds its use when the complete author list is not well known or ill|-|defined. + +Sometimes, or even often, the database might contain variants of an author's +name that we would like to identify as a single, unique author. Indeed, certain +bibliographic styles (as will be seen later) as well as an index of authors, for +example, will depend on this identification. A command \Cindex {btxremapauthor} +allows establishing this identity: + +\startbuffer +\btxremapauthor [Donald Knuth] [Donald E. Knuth] +\btxremapauthor [Don Knuth] [Donald E. Knuth] +\stopbuffer +\getbuffer + +\cindex {btxremapauthor} +\typebuffer [option=TEX] + +Fields other than \Tindex {author} and \Tindex {editor}, for example \Tindex +{artist} or \Tindex {director} if one desires, can be declared to be of type +\quote {author} and thus interpreted as names, but this is a subject for +specialists. + +The \BTXcode {keywords} field can also be split into tokens separated by +semicolons (keyword; keyword; \unknown). This can be useful, as will be seen +later, in the creation of keyword indexes, for example. + +Other string values such as \BTXcode {title} are kept literally (except for an +internal automatic conversion to \UTF\ of certain \TEX\ strings such as accent +combinations, endash, quotations, etc.). Note that the bibliography rendering +style (see below) might specify a capitalization of the title (using the +\CONTEXT\ commands \TEXcode {\Word} or \TEXcode {\Words}, for example). +Capitalized Names and acronyms are respected removing a need for the \BIBTEX\ +practice of \quote {protecting} such words or letters with surrounding curly +brackets (which here are simply stripped off). (Furthermore, since \CONTEXT\ uses +\UTF, it does not suffer from all of the complicated \Index {sorting} issues that +plague \BIBTEX|/|\LATEX.) As some styles might not specify the capitalization of +words in the title whereas other styles might, it is recommended that strings be +written in lower case except where upper case is explicitly required so as to be +compatible with all such capitalization styles. + +\startaside +Some bibliographic database sources can be quite sloppy and return strings +(titles and even authors) in all capitals, for example. We have made the design +choice \emphasis {not} to follow the \BIBTEX\ practice/feature of explicitly +formatting all string values, as we did not want to require the protection +through enclosing curly brackets that would have been a necessary consequence. +Thus, some cleaning of these database files might be needed. Furthermore, we +attempt to use all the power of \CONTEXT\ and \LUA, thus making unnecessary much +(most?) of the \TEX-like encoding of the data. We encourage users to clean|-|up +their \Tindex {.bib} database files as much as possible so that they contain only +the necessary data, with a minimum of explicit formatting directives. +\stopaside + +String values, as described above, can be enclosed indifferently between matching +curly brackets: \BTXcode {{}} or pairs of quotation marks: \BTXcode {""}. +Multiple string values can be \index {string concatenation}concatenated using the +operator \BTXcode {\#}, as will be illustrated in \in {table} +[tab:mkiv-publications.bib]. + +Everything outside of a valid entry is ignored and treated as a \Index {comment}. +Syntactic errors (such as a missing comma or some unbalanced quotes or +parenthesis) are also skipped over, i.e. ignored. This is to attempt to continue +on to valid data but may lead to unexpected results. It is therefore the user's +responsibility to insure the correctness of the data files. Whereas some checks +and warnings are issued, the system is purposefully not too verbose. + +Data is handled on a \quote {first come, first served} basis: duplicate \index +{duplicate+fields}\emphasis {fields} in an entry are ignored \startfootnote Note +that some \BIBTEX\ practice allows for the concatenation of duplicate name \index +{duplicate+fields}fields (i.e. \BTXcode {author} and \BTXcode {editor}) through +\BTXcode {and}, but (silently) ignores duplicate other fields. We choose to have +a consistant behavior and disallow duplicate field occurrences. \stopfootnote +though duplicate \index {duplicate+entries}\emphasis {entries} (having the same +\index {duplicate+tags}tag) are retained, but the subsequent identical \Index +{tag}s will be modified by adding a suffix $-n$ for the $n$\high {th} duplicate. +The presence of duplicate \index {duplicate+fields}fields or \index +{duplicate+tags}tags will be flagged as such with warnings in the log file. +Duplicate \index {duplicate+entries}entries using different \Index {tag}s will +not be treated as duplicates. + +A special provision has been made to declare author \Index {synonyms}, that is +names that might occur with a variation of spellings or aliases. This shall be +discussed later. + +We have attempted to remain compatible with the \BIBTEX\ format, and any new +bibliography extensions that we introduce here were designed in a way to remain +compatible with \BIBTEX, being simply ignored rather than potentially generating +a \BIBTEX\ error. + +The \BIBTEX\ files are loaded in memory as \LUA\ table but can be converted to +\XML\ so that we can access them in a more flexible way, but that is another +subject for specialists. + +\stopsection + +\startsection [reference=sec:Commands,title=Commands in entries] + +One unfortunate aspect commonly found in \BIBTEX\ files is that they may contain +\TEX\ commands. Even worse is that there is no standard on what these commands +can be and what they mean, at least not formally, as \BIBTEX\ is a program +intended to be used with many variants of \TEX\ style: plain, \LATEX, and others. +This means that we need to define our use of these typesetting commands. (In +particular, one might need to redefine those that are too \LATEX|-|centric.) +However, in most cases, they are just abbreviations or font switches and these +are often well known. Therefore, \CONTEXT\ will try to resolve them before +reporting an issue. The log file will announce the commands that have been seen +in the loaded databases. For instance, loading \Tindex {tugboat.bib} (distributed +with \TEXLIVE) gives a long list of commands of which we show a small set of the +five most frequently encountered ones here: + +\startbuffer +\definebtxdataset[tugboat] +\usebtxdataset[tugboat][tugboat.bib] +\stopbuffer + +\getbuffer + +\starttyping +publications > tugboat tt 134 known +publications > tugboat Dash 136 unknown +publications > tugboat acro 137 known +publications > tugboat LaTeX 209 known +publications > tugboat TeX 856 known +\stoptyping + +Some are flagged as known and others as unknown. You can define unknown commands, +or overload existing definitions in the standard way (\emphasis {e.g.} \TEXcode +{\def\Dash{—}}), the \CONTEXT\ way (\TEXcode {\define\Dash{—}}) or, +alternatively, in the following way: + +\cindex {definebtxcommand} + +\startTEX +\definebtxcommand\TUB {TUGboat} +\definebtxcommand\MP {METAPOST} +\definebtxcommand\sltt{\tt} +\definebtxcommand\<#1>{\type{#1}} +\stopTEX + +\definebtxcommand\MP {METAPOST} % to be used silently below + +Custom commands created using \Cindex {definebtxcommand} have the advantage of +using a separate name space thus allowing \Index {isolation} from other \CONTEXT\ +commands. (The \Index {isolation} of \Cindex {btxcommand} allows the \Tindex +{.bib} files to safely contain \TEX\ and \LATEX\ idiosyncrasies that might +conflict with proper \CONTEXT\ syntax.) Unknown commands do not stall processing, +but their names are then typeset in a mono|-|spaced font so they probably stand +out for proofreading. You can access the commands using \index +{btxcommand}\TEXcode {\btxcommand{...}} (or \Cindex {btxcmd}), as in: + +\startbuffer +commands like \btxcommand{MySpecialCommand} are handled in an indirect way +\stopbuffer + +\cindex {btxcommand} + +\typeTEXbuffer + +As this is an undefined command we get: \quotation {\inlinebuffer}. + +Often, these embedded \TEX\ commands are present in \Tindex {.bib} files in order +to trick \BIBTEX\ into certain behavior. Since this will generally not be +necessary here, we strongly encourage users to clean|-|up such unnecessary +extras. Indeed, the idea is to keep the data clean, using styles and parameter +settings instead to handle rendering issues. Indeed, we don't see it as challenge +nor as a duty to support all kinds of messy definitions. Of course, we try to be +somewhat tolerant, but you will be sure to get better results if you use nicely +setup, consistent databases. + +Finally, the \BIBTEX\ entry \tindex {@string}\BTXcode {@String{}} is preprocessed +as expected. + +\tindex {@string} + +\startTEX +@String{j-TUGboat = "TUGboat"} +\stopTEX + +\startaside +Notice that \Tindex {tugboat.bib} also contains: \tindex {@preamble} +\startBTX +@Preamble{"\input tugboat.def"} +@Preamble{"\input path.sty"} +\stopBTX +These are silently ignored as many such commands are most likely not to be +compatible with \CONTEXT. Indeed, the examples shown here are not! +\stopaside + +\stopsection + +\startsection[title=\MKII\ definitions] + +In the old \MKII\ setup we have two kinds of entries: the ones that come from the +\BIBTEX\ run and additional user|-|supplied ones. We no longer rely on \BIBTEX\ +output but we do still support the user supplied definitions. These were in fact +prepared in a way that suits the processing of the \BIBTEX\ generated entries; +The next variant reflects the \CONTEXT\ recoding of the old \BIBTEX\ output. For +this reason, some users refer to this as \Tindex {.bbl} format. + +\cindex {startpublication} +\cindex {stoppublication} + +\startTEX +\startpublication[k=Hagen:Second,t=article,a={Hans Hagen},y=2013,s=HH01] + \artauthor[] {Hans}[H.]{}{Hagen} + \arttitle {Who knows more?} + \journal {MyJournal} + \pubyear {2013} + \month {8} + \volume {1} + \issue {3} + \issn {1234-5678} + \pages {123--126} +\stoppublication +\stopTEX + +The split \TEXcode {\artauthor} fields will be collapsed into a single \TEXcode +{author} field as we handle the splitting later when it gets parsed in \LUA. The +\TEXcode {\artauthor} syntax is only kept around for backward compatibility with +the previous use of \BIBTEX. + +In the new setup we support these variants: + +\cindex {startpublication} +\cindex {stoppublication} + +\startTEX +\startpublication[k=Hagen:Third,t=article] + \author{Hans Hagen} + \title {Who knows who?} + ... +\stoppublication +\stopTEX + +as well as + +\cindex {startpublication} +\cindex {stoppublication} + +\startTEX +\startpublication[tag=Hagen:Third,category=article] + \author{Hans Hagen} + \title {Who knows who?} + ... +\stoppublication +\stopTEX + +and + +\cindex {startpublication} +\cindex {stoppublication} + +\startTEX +\startpublication + \tag {Hagen:Third} + \category{article} + \author {Hans Hagen} + \title {Who knows who?} + ... +\stoppublication +\stopTEX + +The use of this format will be illustrated later a means to export the database +which may be of great use in converting collections of \MKII\ bibliography files. + +\showsetup[startpublication] + +\stopsection + +\startsection[title=\LUA\ tables] + +Because internally the entries are \index [LUA table] {\LUA\ table}\LUA\ tables, +we also support the loading of \LUA\ based definitions: + +\startLUA +return { + ["Hagen:First"] = { + author = "Hans Hagen", + category = "article", + issn = "1234-5678", + issue = "3", + journal = "MyJournal", + month = "8", + pages = "123--126", + tag = "Hagen:First", + title = "Who knows nothing?", + volume = "1", + year = "2013", + }, +} +\stopLUA + +Notice that the \Index {tag} is redundantly specified; it is \quote {pushed} into +the table so that one can access it without having to know the \Index {tag} of the +original table. + +\stopsection + +\startsection[title=\XML] + +The following \index [XML] {\XML}\XML\ input is rather close in structure, and is +also accepted as input. + +\startXML +<?xml version="2.0" standalone="yes" ?> +<bibtex> + <entry tag="Hagen:First" category="article"> + <field name="author">Hans Hagen</field> + <field name="category">article</field> + <field name="issn">1234-5678</field> + <field name="issue">3</field> + <field name="journal">MyJournal</field> + <field name="month">8</field> + <field name="pages">123--126</field> + <field name="tag">Hagen:First</field> + <field name="title">Who knows nothing?</field> + <field name="volume">1</field> + <field name="year">2013</field> + </entry> +</bibtex> +\stopXML + +We shall focus on the use of \BIBTEX\ \Tindex {.bib} files as the input data +format of reference. Keep in mind, however, that the \index [LUA table] {\LUA\ +table}\LUA\ table format and the \index [XML] {\XML}\XML\ format might prove to +be more flexible for future expansion of functionality. + +\stopsection + +\startsection[title=Other formats] + +Various other bibliographic data file formats are in common use, such as: + +\starttabulate [|Tl|p|] +\NC savedrecs.txt \NC Institute of Scientific Information (ISI) tagged format + (e.g. Thomson Reuters™ Web of Science™), \NC \NR +\NC filename.enw \NC Thomson Reuters™ Endnote™ export format + (there is also an Endnote \type {.xml} export), \NC \NR +\NC filename.ris \NC Research Information Systems, Incorporated, now + Thomson Reuters™ Reference Manager™, and \NC \NR +\NC pubmed_result.txt \NC The National Library of Medicine® (NLM®) + MEDLINE®|/|PubMed® data format \NC \NR +\stoptabulate + +just to name a few (amongst many more). Filters can be easily written in \LUA\ to +read these and other bibliography data formats, although no such filters are +provided. This is because the user has a choice of a certain number of +bibliography database management programs that can easily convert from these to +the \BIBTEX\ format. (Notable, open source examples are \index {jabref} \goto +{jabref} [url(http://jabref.sourceforge.net)] and \index {zotero} \goto {zotero} +[url(http://www.zotero.org)].) Indeed, it is not the vocation of the present +\CONTEXT\ bibliography subsystem to fully manage the bibliography data sources, +only to be able to use such data in the production of documents. + +\startaside +\emphasis {A note on database management programs:} these are very valuable tools +for the manipulation of bibliography database information, which is why the +\BIBTEX\ format has so much importance for us here. However, one must be aware +that these programs are not standards and many of them may introduce invalid +extensions that might not even be handled correctly by \BIBTEX\ itself. +\stopaside + +\stopsection + +\stopchapter + +\stopcomponent |