summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/publications/publications-database.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/publications/publications-database.tex')
-rw-r--r--doc/context/sources/general/manuals/publications/publications-database.tex553
1 files changed, 553 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/publications/publications-database.tex b/doc/context/sources/general/manuals/publications/publications-database.tex
new file mode 100644
index 000000000..656ace56d
--- /dev/null
+++ b/doc/context/sources/general/manuals/publications/publications-database.tex
@@ -0,0 +1,553 @@
+\environment publications-style
+
+\startcomponent publications-database
+
+\startchapter[title=The database]
+
+The bibliography subsystem uses a database (or a set of databases) to construct a
+list of citations to be used in a scholarly work. However, it will be shown later
+that the database system can be used (and abused) to many ends having little or
+nothing at all to do with citations and bibliographies. Nevertheless, at first we
+shall remain focused on the use of bibliography databases.
+
+The data to be used must have a source and a structure. In the next sections we
+describe the possible input.
+
+\startsection[title=\BibTeX]
+
+The \BIBTEX\ format is rather popular in the \TEX\ community and even with its
+shortcomings it will stay around for a while. Many publication websites can
+export and many tools are available to work with this database format. It is
+rather simple and looks a bit like \index [LUA table] {\LUA\ table}\LUA\ tables.
+Indeed, it is said that the \BIBTEX\ format was one of the inspirations for the
+constructor syntax in \LUA\ \cite [alternative=num,
+righttext={\btxcomma Chapter\nbsp 12.}] [default::Ierusalimschy2006].
+
+Unfortunately the content can be (and usually is) polluted with
+non|-|standardized \TEX\ commands which complicates pre- or post|-|processing
+outside \TEX. In that sense a \BIBTEX\ database is often not coded neutrally.
+Some limitations, like the use of commands to encode accented characters root in
+the \ASCII\ world and can be bypassed by using \index [UTF] {\UTF}\UTF\ instead
+(as handled somewhat in \LATEX\ through extensions such as \Tindex {bibtex8}).
+
+The normal way to deal with a bibliography is to refer to entries using a unique
+\Index {tag} or key. When a text containing a list of entries is typeset, this
+reference can be used for linking purposes. The list can be processed and sorted
+using the \Tindex {bibtex} program that converts the database into something more
+\TEX\ friendly (a \Tindex {.bbl} file).
+
+In \CONTEXT\ we no longer use the (external) \goto {\Tindex {bibtex} program}
+[url(https://www.ctan.org/pkg/bibtex)] at all: we simply parse the database files
+in \LUA\ and deal with the necessary manipulations directly in \CONTEXT. One or
+more such databases can be used and combined with additional entries defined
+within the document. We can have several such datasets active at the same time.
+
+\startaside
+\emphasis {On the name \Tindex {btx}:} many of the \CONTEXT\ commands that will be
+used in the following contain the label \TEXcode {btx} in their name. This
+identifier was retained despite the fact that \CONTEXT\ \MKIV\ is now completely
+independent of \BIBTEX; it reflects the role still played by \BIBTEX\ data as a
+preferred source format and serves as a handy, unique identifier, both internally
+in the programming as well as for the user. This three|-|letter label is
+systematically used in commands that otherwise attempt to avoid cryptic|-|styled
+names.
+\stopaside
+
+A \BIBTEX\ file entry looks like this:
+
+\startBTX
+@Article {sometag,
+ author = "An Author and Another One",
+ title = "A hopefully meaningful title",
+ journal = maps,
+ volume = "25",
+ number = "2",
+ pages = "5--9",
+ month = mar,
+ year = "2013",
+ ISSN = "1234-5678",
+}
+\stopBTX
+
+Entries are of the form: \index {category}\BTXcode {@category{...}}
+
+Anything outside of a valid \BTXcode {@category{...}} construction is ignored and
+is taken to be a comment. Within an entry, there are to be no comments but one
+can prefix field names, for example, to have them ignored.
+
+There is a special entry type named \index {@comment}\BTXcode {@comment{...}}.
+The main use of such an entry type is to comment a large part of the bibliography
+easily, since anything outside an entry is already a comment, and commenting out
+one entry may be achieved by just removing its initial~\BTXcode {@}. — The \index
+{@comment}\BTXcode {@comment{...}} entry is perhaps of some use, although this is
+not very elegant! As one can input multiple bibliography data files, as will be
+seen below, it is much better practice to split datafiles for optional loading.
+
+Many \BIBTEX\ data management tools such as \Tindex {jabref} (see below) will
+ignore and then throw|-|away all such handily|-|crafted comments and data entries
+turned into comments. So one must beware!
+
+The field names are all cast to lowercase so capitalization is irrelevant;
+Spacing is not important and should be used advantageously for readability. The
+leading \Index {tag} (\BTXcode {sometag} in the example above) cannot contain
+spaces and \emphasis {must} be followed by a comma.
+
+The entry \Index {tag} (\BTXcode {@category{sometag,...}}) is not to be confused
+with the optional field \BTXcode {key=sortkey,} that may also be present.
+
+Normally a value is given between quotes (or curly brackets) but single words are
+also valid (as there is no real benefit in not using quotes or curly brackets, we
+advise to always use them, contrary to our example above). The order of the
+fields in an entry is inconsequential and there can be many more fields than
+those shown above. Instead of string values one can also use predefined
+shortcuts. The title for example might quite often contain \TEX\ macros, and some
+fields, like \BTXcode {pages} have funny characters such as the endash (typically
+entered as \BTXcode {--}) so we have a mixture of data and typesetting
+directives. Furthermore, if you are covering non||English references, you often
+need characters that are not in the \ASCII\ subset. Note that \CONTEXT\ is quite
+happy with \UTF, but if your database file uses old|-|fashioned \TEX\ accent
+combinations then these will be internally converted automatically to \UTF.
+
+Commands (macros) found in a database file are converted to an indirect call,
+which is quite robust. The use of commands in the database file will be described
+in \in {section} [sec:Commands].
+
+The \Tindex {author} (and \Tindex {editor}) fields are parsed separating multiple
+authors identified by the conjunction \quote {and}. Each name is assumed to be in
+the form:
+
+\definetyping
+ [NameSyntax]
+ [margin=1em]
+
+\startNameSyntax
+Firstname(s) Lastname
+\stopNameSyntax
+
+\seeindex {vons} {particule}
+
+where \type {Lastname} is a single word but may include an optional (nobility)
+\Index {particule}: lower|-|case word(s) such as \quotation {von}, \quotation
+{de}, \quotation {de la}, etc.) \emphasis {unless} specifically in the two- or
+three|-|token form:
+
+\index {suffix}
+
+\startNameSyntax
+Lastname(s), Firstname(s)
+Lastnames(s), Suffix(es), Firstname(s)
+\stopNameSyntax
+
+separated explicitly using comma(s) thus allowing multi|-|word \type {Lastnames}.
+
+\startaside
+An \BTXcode {author} field is sometimes abused in traditional \BIBTEX\ usage to
+hold not a name but rather an entity. Other fields, such as \BTXcode
+{organization} or \BTXcode {collaboration}, for example, should be used in such
+cases.
+\stopaside
+
+\BIBTEX\ also (obscurely) supports the syntax:
+
+\seeindex {juniors}{suffix}
+\index {suffix}
+
+\startNameSyntax
+Firstname(s) \{Lastname(s), Suffix(es)\}
+\stopNameSyntax
+
+we may (or may not) support this in the future, so don't use this!
+
+We extend \BIBTEX\ by optionally parsing each name in terms of four or five
+tokens:
+
+\index {particule} \index {suffix} \index {initial}
+
+\startNameSyntax
+Particule(s), Lastname(s), Suffix(es), Firstname(s)
+Particule(s), Lastname(s), Suffix(es), Firstname(s), Initial(s)
+\stopNameSyntax
+
+in order to allow a free form for the particules, irrespective of capitalization,
+thus avoiding the need to resort to any sort of \TEX\ trickery \cite [num]
+[default::Patashnik1988,Markey2009]. In fact, an optional sixth token is parsed
+whose meaning is presently reserved for future directives describing how the name
+is to be interpreted:
+
+\index {particule} \index {suffix} \index {initial}
+
+\startNameSyntax
+Particule(s), Lastname(s), Suffix(es), Firstname(s), Initial(s), directives
+\stopNameSyntax
+
+\BIBTEX\ additionally accepts the special token \Tindex {others} to be used
+(sparingly) to indicate an incomplete author list. Note that most style
+specifications will handle the truncation of long author lists in a systematic
+fashion. The \index [others] {\tt and others}\BTXcode {and others} construction
+finds its use when the complete author list is not well known or ill|-|defined.
+
+Sometimes, or even often, the database might contain variants of an author's
+name that we would like to identify as a single, unique author. Indeed, certain
+bibliographic styles (as will be seen later) as well as an index of authors, for
+example, will depend on this identification. A command \Cindex {btxremapauthor}
+allows establishing this identity:
+
+\startbuffer
+\btxremapauthor [Donald Knuth] [Donald E. Knuth]
+\btxremapauthor [Don Knuth] [Donald E. Knuth]
+\stopbuffer
+\getbuffer
+
+\cindex {btxremapauthor}
+\typebuffer [option=TEX]
+
+Fields other than \Tindex {author} and \Tindex {editor}, for example \Tindex
+{artist} or \Tindex {director} if one desires, can be declared to be of type
+\quote {author} and thus interpreted as names, but this is a subject for
+specialists.
+
+The \BTXcode {keywords} field can also be split into tokens separated by
+semicolons (keyword; keyword; \unknown). This can be useful, as will be seen
+later, in the creation of keyword indexes, for example.
+
+Other string values such as \BTXcode {title} are kept literally (except for an
+internal automatic conversion to \UTF\ of certain \TEX\ strings such as accent
+combinations, endash, quotations, etc.). Note that the bibliography rendering
+style (see below) might specify a capitalization of the title (using the
+\CONTEXT\ commands \TEXcode {\Word} or \TEXcode {\Words}, for example).
+Capitalized Names and acronyms are respected removing a need for the \BIBTEX\
+practice of \quote {protecting} such words or letters with surrounding curly
+brackets (which here are simply stripped off). (Furthermore, since \CONTEXT\ uses
+\UTF, it does not suffer from all of the complicated \Index {sorting} issues that
+plague \BIBTEX|/|\LATEX.) As some styles might not specify the capitalization of
+words in the title whereas other styles might, it is recommended that strings be
+written in lower case except where upper case is explicitly required so as to be
+compatible with all such capitalization styles.
+
+\startaside
+Some bibliographic database sources can be quite sloppy and return strings
+(titles and even authors) in all capitals, for example. We have made the design
+choice \emphasis {not} to follow the \BIBTEX\ practice/feature of explicitly
+formatting all string values, as we did not want to require the protection
+through enclosing curly brackets that would have been a necessary consequence.
+Thus, some cleaning of these database files might be needed. Furthermore, we
+attempt to use all the power of \CONTEXT\ and \LUA, thus making unnecessary much
+(most?) of the \TEX-like encoding of the data. We encourage users to clean|-|up
+their \Tindex {.bib} database files as much as possible so that they contain only
+the necessary data, with a minimum of explicit formatting directives.
+\stopaside
+
+String values, as described above, can be enclosed indifferently between matching
+curly brackets: \BTXcode {{}} or pairs of quotation marks: \BTXcode {""}.
+Multiple string values can be \index {string concatenation}concatenated using the
+operator \BTXcode {\#}, as will be illustrated in \in {table}
+[tab:mkiv-publications.bib].
+
+Everything outside of a valid entry is ignored and treated as a \Index {comment}.
+Syntactic errors (such as a missing comma or some unbalanced quotes or
+parenthesis) are also skipped over, i.e. ignored. This is to attempt to continue
+on to valid data but may lead to unexpected results. It is therefore the user's
+responsibility to insure the correctness of the data files. Whereas some checks
+and warnings are issued, the system is purposefully not too verbose.
+
+Data is handled on a \quote {first come, first served} basis: duplicate \index
+{duplicate+fields}\emphasis {fields} in an entry are ignored \startfootnote Note
+that some \BIBTEX\ practice allows for the concatenation of duplicate name \index
+{duplicate+fields}fields (i.e. \BTXcode {author} and \BTXcode {editor}) through
+\BTXcode {and}, but (silently) ignores duplicate other fields. We choose to have
+a consistant behavior and disallow duplicate field occurrences. \stopfootnote
+though duplicate \index {duplicate+entries}\emphasis {entries} (having the same
+\index {duplicate+tags}tag) are retained, but the subsequent identical \Index
+{tag}s will be modified by adding a suffix $-n$ for the $n$\high {th} duplicate.
+The presence of duplicate \index {duplicate+fields}fields or \index
+{duplicate+tags}tags will be flagged as such with warnings in the log file.
+Duplicate \index {duplicate+entries}entries using different \Index {tag}s will
+not be treated as duplicates.
+
+A special provision has been made to declare author \Index {synonyms}, that is
+names that might occur with a variation of spellings or aliases. This shall be
+discussed later.
+
+We have attempted to remain compatible with the \BIBTEX\ format, and any new
+bibliography extensions that we introduce here were designed in a way to remain
+compatible with \BIBTEX, being simply ignored rather than potentially generating
+a \BIBTEX\ error.
+
+The \BIBTEX\ files are loaded in memory as \LUA\ table but can be converted to
+\XML\ so that we can access them in a more flexible way, but that is another
+subject for specialists.
+
+\stopsection
+
+\startsection [reference=sec:Commands,title=Commands in entries]
+
+One unfortunate aspect commonly found in \BIBTEX\ files is that they may contain
+\TEX\ commands. Even worse is that there is no standard on what these commands
+can be and what they mean, at least not formally, as \BIBTEX\ is a program
+intended to be used with many variants of \TEX\ style: plain, \LATEX, and others.
+This means that we need to define our use of these typesetting commands. (In
+particular, one might need to redefine those that are too \LATEX|-|centric.)
+However, in most cases, they are just abbreviations or font switches and these
+are often well known. Therefore, \CONTEXT\ will try to resolve them before
+reporting an issue. The log file will announce the commands that have been seen
+in the loaded databases. For instance, loading \Tindex {tugboat.bib} (distributed
+with \TEXLIVE) gives a long list of commands of which we show a small set of the
+five most frequently encountered ones here:
+
+\startbuffer
+\definebtxdataset[tugboat]
+\usebtxdataset[tugboat][tugboat.bib]
+\stopbuffer
+
+\getbuffer
+
+\starttyping
+publications > tugboat tt 134 known
+publications > tugboat Dash 136 unknown
+publications > tugboat acro 137 known
+publications > tugboat LaTeX 209 known
+publications > tugboat TeX 856 known
+\stoptyping
+
+Some are flagged as known and others as unknown. You can define unknown commands,
+or overload existing definitions in the standard way (\emphasis {e.g.} \TEXcode
+{\def\Dash{—}}), the \CONTEXT\ way (\TEXcode {\define\Dash{—}}) or,
+alternatively, in the following way:
+
+\cindex {definebtxcommand}
+
+\startTEX
+\definebtxcommand\TUB {TUGboat}
+\definebtxcommand\MP {METAPOST}
+\definebtxcommand\sltt{\tt}
+\definebtxcommand\<#1>{\type{#1}}
+\stopTEX
+
+\definebtxcommand\MP {METAPOST} % to be used silently below
+
+Custom commands created using \Cindex {definebtxcommand} have the advantage of
+using a separate name space thus allowing \Index {isolation} from other \CONTEXT\
+commands. (The \Index {isolation} of \Cindex {btxcommand} allows the \Tindex
+{.bib} files to safely contain \TEX\ and \LATEX\ idiosyncrasies that might
+conflict with proper \CONTEXT\ syntax.) Unknown commands do not stall processing,
+but their names are then typeset in a mono|-|spaced font so they probably stand
+out for proofreading. You can access the commands using \index
+{btxcommand}\TEXcode {\btxcommand{...}} (or \Cindex {btxcmd}), as in:
+
+\startbuffer
+commands like \btxcommand{MySpecialCommand} are handled in an indirect way
+\stopbuffer
+
+\cindex {btxcommand}
+
+\typeTEXbuffer
+
+As this is an undefined command we get: \quotation {\inlinebuffer}.
+
+Often, these embedded \TEX\ commands are present in \Tindex {.bib} files in order
+to trick \BIBTEX\ into certain behavior. Since this will generally not be
+necessary here, we strongly encourage users to clean|-|up such unnecessary
+extras. Indeed, the idea is to keep the data clean, using styles and parameter
+settings instead to handle rendering issues. Indeed, we don't see it as challenge
+nor as a duty to support all kinds of messy definitions. Of course, we try to be
+somewhat tolerant, but you will be sure to get better results if you use nicely
+setup, consistent databases.
+
+Finally, the \BIBTEX\ entry \tindex {@string}\BTXcode {@String{}} is preprocessed
+as expected.
+
+\tindex {@string}
+
+\startTEX
+@String{j-TUGboat = "TUGboat"}
+\stopTEX
+
+\startaside
+Notice that \Tindex {tugboat.bib} also contains: \tindex {@preamble}
+\startBTX
+@Preamble{"\input tugboat.def"}
+@Preamble{"\input path.sty"}
+\stopBTX
+These are silently ignored as many such commands are most likely not to be
+compatible with \CONTEXT. Indeed, the examples shown here are not!
+\stopaside
+
+\stopsection
+
+\startsection[title=\MKII\ definitions]
+
+In the old \MKII\ setup we have two kinds of entries: the ones that come from the
+\BIBTEX\ run and additional user|-|supplied ones. We no longer rely on \BIBTEX\
+output but we do still support the user supplied definitions. These were in fact
+prepared in a way that suits the processing of the \BIBTEX\ generated entries;
+The next variant reflects the \CONTEXT\ recoding of the old \BIBTEX\ output. For
+this reason, some users refer to this as \Tindex {.bbl} format.
+
+\cindex {startpublication}
+\cindex {stoppublication}
+
+\startTEX
+\startpublication[k=Hagen:Second,t=article,a={Hans Hagen},y=2013,s=HH01]
+ \artauthor[] {Hans}[H.]{}{Hagen}
+ \arttitle {Who knows more?}
+ \journal {MyJournal}
+ \pubyear {2013}
+ \month {8}
+ \volume {1}
+ \issue {3}
+ \issn {1234-5678}
+ \pages {123--126}
+\stoppublication
+\stopTEX
+
+The split \TEXcode {\artauthor} fields will be collapsed into a single \TEXcode
+{author} field as we handle the splitting later when it gets parsed in \LUA. The
+\TEXcode {\artauthor} syntax is only kept around for backward compatibility with
+the previous use of \BIBTEX.
+
+In the new setup we support these variants:
+
+\cindex {startpublication}
+\cindex {stoppublication}
+
+\startTEX
+\startpublication[k=Hagen:Third,t=article]
+ \author{Hans Hagen}
+ \title {Who knows who?}
+ ...
+\stoppublication
+\stopTEX
+
+as well as
+
+\cindex {startpublication}
+\cindex {stoppublication}
+
+\startTEX
+\startpublication[tag=Hagen:Third,category=article]
+ \author{Hans Hagen}
+ \title {Who knows who?}
+ ...
+\stoppublication
+\stopTEX
+
+and
+
+\cindex {startpublication}
+\cindex {stoppublication}
+
+\startTEX
+\startpublication
+ \tag {Hagen:Third}
+ \category{article}
+ \author {Hans Hagen}
+ \title {Who knows who?}
+ ...
+\stoppublication
+\stopTEX
+
+The use of this format will be illustrated later a means to export the database
+which may be of great use in converting collections of \MKII\ bibliography files.
+
+\showsetup[startpublication]
+
+\stopsection
+
+\startsection[title=\LUA\ tables]
+
+Because internally the entries are \index [LUA table] {\LUA\ table}\LUA\ tables,
+we also support the loading of \LUA\ based definitions:
+
+\startLUA
+return {
+ ["Hagen:First"] = {
+ author = "Hans Hagen",
+ category = "article",
+ issn = "1234-5678",
+ issue = "3",
+ journal = "MyJournal",
+ month = "8",
+ pages = "123--126",
+ tag = "Hagen:First",
+ title = "Who knows nothing?",
+ volume = "1",
+ year = "2013",
+ },
+}
+\stopLUA
+
+Notice that the \Index {tag} is redundantly specified; it is \quote {pushed} into
+the table so that one can access it without having to know the \Index {tag} of the
+original table.
+
+\stopsection
+
+\startsection[title=\XML]
+
+The following \index [XML] {\XML}\XML\ input is rather close in structure, and is
+also accepted as input.
+
+\startXML
+<?xml version="2.0" standalone="yes" ?>
+<bibtex>
+ <entry tag="Hagen:First" category="article">
+ <field name="author">Hans Hagen</field>
+ <field name="category">article</field>
+ <field name="issn">1234-5678</field>
+ <field name="issue">3</field>
+ <field name="journal">MyJournal</field>
+ <field name="month">8</field>
+ <field name="pages">123--126</field>
+ <field name="tag">Hagen:First</field>
+ <field name="title">Who knows nothing?</field>
+ <field name="volume">1</field>
+ <field name="year">2013</field>
+ </entry>
+</bibtex>
+\stopXML
+
+We shall focus on the use of \BIBTEX\ \Tindex {.bib} files as the input data
+format of reference. Keep in mind, however, that the \index [LUA table] {\LUA\
+table}\LUA\ table format and the \index [XML] {\XML}\XML\ format might prove to
+be more flexible for future expansion of functionality.
+
+\stopsection
+
+\startsection[title=Other formats]
+
+Various other bibliographic data file formats are in common use, such as:
+
+\starttabulate [|Tl|p|]
+\NC savedrecs.txt \NC Institute of Scientific Information (ISI) tagged format
+ (e.g. Thomson Reuters™ Web of Science™), \NC \NR
+\NC filename.enw \NC Thomson Reuters™ Endnote™ export format
+ (there is also an Endnote \type {.xml} export), \NC \NR
+\NC filename.ris \NC Research Information Systems, Incorporated, now
+ Thomson Reuters™ Reference Manager™, and \NC \NR
+\NC pubmed_result.txt \NC The National Library of Medicine® (NLM®)
+ MEDLINE®|/|PubMed® data format \NC \NR
+\stoptabulate
+
+just to name a few (amongst many more). Filters can be easily written in \LUA\ to
+read these and other bibliography data formats, although no such filters are
+provided. This is because the user has a choice of a certain number of
+bibliography database management programs that can easily convert from these to
+the \BIBTEX\ format. (Notable, open source examples are \index {jabref} \goto
+{jabref} [url(http://jabref.sourceforge.net)] and \index {zotero} \goto {zotero}
+[url(http://www.zotero.org)].) Indeed, it is not the vocation of the present
+\CONTEXT\ bibliography subsystem to fully manage the bibliography data sources,
+only to be able to use such data in the production of documents.
+
+\startaside
+\emphasis {A note on database management programs:} these are very valuable tools
+for the manipulation of bibliography database information, which is why the
+\BIBTEX\ format has so much importance for us here. However, one must be aware
+that these programs are not standards and many of them may introduce invalid
+extensions that might not even be handled correctly by \BIBTEX\ itself.
+\stopaside
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent