<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2822 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2822.xml">
<!ENTITY RFC3629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml">
<!ENTITY RFC5013 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5013.xml">

]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-kunze-thump-03" ipr="trust200902">
 <!-- category values: std, bcp, info, exp, and historic
    ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
       or pre5378Trust200902
    you can add the attributes updates="NNNN" and obsoletes="NNNN" 
    they will automatically be output with "(if approved)" -->

 <!-- ***** FRONT MATTER ***** -->

 <front>
   <!-- The abbreviated title is used in the page header - it is only
        necessary if the full title is longer than 39 characters -->

   <title abbrev="THUMP">THUMP:  The HTTP URL Mapping Protocol</title>

   <!-- add 'role="editor"' below for the editors if appropriate -->

   <!-- Another author who claims to be an editor -->

   <author fullname="John Kunze" initials="J.A." surname="Kunze">
     <organization>California Digital Library</organization>
     <address>
       <postal>
         <street>415 20th St, #406</street>
         <!-- Reorder these if your country does things differently -->
         <city>Oakland</city>
         <region>CA</region>
         <code>94612</code>
         <country>USA</country>
       </postal>
       <email>jak@ucop.edu</email>
       <!-- uri and facsimile elements may also be added -->
     </address>
   </author>

  <author fullname="Nassib Nassar" initials="N." surname="Nassar"> 
    <organization>Index Data ApS</organization>
    <address>
     <postal>
      <street>Njalsgade 76, 13</street>
      <city>Copenhagen</city>
      <code>2300</code>
      <country>Denmark</country>
     </postal>
     <email>nassib@indexdata.com</email>
    </address>
  </author>

  <date year="2017" />

   <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
        in the current day for you. If only the current year is specified, xml2rfc will fill 
	 in the current day and month for you. If the year is not the current one, it is 
	 necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
	 purpose of calculating the expiry date).  With drafts it is normally sufficient to 
	 specify just the year. -->

   <!-- Meta-data Declarations -->

   <area>General</area>

   <workgroup></workgroup>

   <!-- WG name at the upperleft corner of the doc,
        IETF is fine for individual submissions.  
	 If this element is not present, the default is "Network Working Group",
        which is used by the RFC Editor as a nod to the history of the IETF. -->

   <keyword>URL query language, inflections</keyword>

   <!-- Keywords will be incorporated into HTML output
        files in a meta tag but they have no effect on text or nroff
        output. If you submit your draft to the RFC Editor, the
        keywords will be used for the search engine. -->

  <abstract>
<t>
The HTTP URL Mapping Protocol (THUMP) is a set of URL-based conventions
for retrieving information and conducting searches.
THUMP can be used for focused retrievals or for broad database queries.
A THUMP request is a URL containing a query string that starts with a `?',
and can contain one or more THUMP commands.  Returned records are
formatted with Dublin Core Kernel metadata as Electronic Resource Citations,
which are similar to blocks of email headers.
</t>
  </abstract>

 </front>

 <middle>

<!--
.\" xxx get(), if used, should complement put()
.\" XXXXXX  MAKE next example THUMP version 0.6 as per new ARK spec (5/22/08)
.\" DELETE this (deleted already from ARK spec)
.\"     S: set-start: California Digital Library | THUMP 0.5 | 20060606161407
.\"     S:         | http://ark.cdlib.org/ark:/13030/ft167nb0vq?
.\" 10  S:         | http://dublincore.org/groups/kernel/erc
.\"     S: here: 1 | 1 | 1
.\"     S:
-->

<section title="Overview">

<t>
This document specifies The HTTP URL Mapping Protocol (THUMP), a set of
URL-based conventions for retrieving information and conducting searches.
THUMP can be used for focused retrievals; e.g., for a given known-item,
asking that a specifically formatted subset of information about it be
returned.  It can also be used for broad database queries, such as
finding all records matching the word, "monitor".
</t>

<t>
A THUMP request is a URL containing a query string that starts with a `?',
and can contain one or more THUMP commands.  A request is passed to a
server with HTTP GET (or POST if desired).  The shortest request is a URL
ending in `?', as in,

<figure>
 <artwork>
    http://example.com/object321?
 </artwork>
</figure>

which asks the server to return a metadata record describing the
information item identified by the URL.  This is a shorthand for
the common request for a short description of a known-item; the
completely spelled out equivalent in this case would be

<figure>
 <artwork>
    http://example.com/object321?show(brief)as(anvl/erc)
 </artwork>
</figure>

An example of a broad database search is,

<figure>
 <artwork>
    http://example.com/?in(books)find(war and peace)show(full)
 </artwork>
</figure>

Query strings and responses are UTF8-encoded
<xref target="RFC3629"/>.
A THUMP response is an HTTP message body containing one or more records.
Records contain Kernel metadata
<xref target="Kernel"/>
elements h1-h4.  The currently defined tag names are summarized below,
formatted as Electronic Resource Citations (ERC), which are similar to
blocks of email headers.  In an ERC each element consists of a label,
colon, and value; long values are continued on indented lines and empty
lines separate records.  It will be possible in a future version of THUMP
to request ERC records formatted in XML.
</t>

</section>

<section title="A Sample THUMP Session">

<t>
THUMP is very simple and follows the classical stateless HTTP communication
model.  This section contains a complete annotated example of a request and
response exchange.  To summarize, the requester sets up a TCP/HTTP session
with the server system, sends a THUMP request inside an HTTP request,
receives an answer inside an HTTP response, and closes the session.
</t>

<t>
In the following example THUMP session, each line has been annotated to
include a line number and whether it was the client or server that sent
it.  Without going into depth, the session has four pieces separated by
blank lines:  the client's piece (lines 1-3), the server's HTTP/THUMP
response headers (4-7), and the body of the server's response (8-18).
The first and last lines (1 and 18) correspond to the client's steps to
start the TCP session and the server's steps to end it, respectively.
The heart of the request is the known-item metadata request indicated by
the URL ending in a single `?' on line 2.

<figure>
 <artwork>
 1  C: [opens session]
    C: GET http://ark.cdlib.org/ark:/13030/ft167nb0vq? HTTP/1.1
    C: 
    S: HTTP/1.1 200 OK
 5  S: Content-Type: text/plain
    S: THUMP-Status: 0.6 200 OK
    S: 
    S: erc:
    S: who:   Stanton A. Glantz and Edith D.  Balbach
10  S: what:  Tobacco War: Inside the California Battles
    S: when:  20000510
    S: where: http://ark.cdlib.org/ark:/13030/ft167nb0vq
    S: [closes session]
 </artwork>
</figure>

The first two server response lines (4-5) above are typical with HTTP.
The next line (6) is peculiar to THUMP, and indicates the THUMP version
and a normal return status.  The balance of the response consists of a
single metadata record (8-12) that comprises the service response.
</t>

<!--
    S: set-start: California Digital Library | THUMP 0.6 | 20060606161407
    S:         | http://ark.cdlib.org/ark:/13030/ft167nb0vq?
10  S:         | http://dublincore.org/groups/kernel/erc
    S: here: 1 | 1 | 1
    S: 
The record set header identifies (8-11) who created the set, what created
it, when it was created, where an automated process can re-access the set,
and where to look up the meaning of metadata elements; it ends in a line
(11) whose respective sub-elements indicate that here in this communication
the recipient can expect to find 1 record, starting at the record numbered
1, from a set consisting of a total of 1 record (i.e., here is the entire
set, consisting of exactly one record).
-->

<t>
The returned record (8-12) in this case is in the ERC format (other
formats are possible).  It contains four elements that answer high
priority questions regarding an expression of the object:  who played a
major role in expressing it, what the expression was called, when is was
created, and where the expression may be found.
</t>

</section>

<section title="Keys and Citations">

<t>
A THUMP request is a command sequence operating on a Key, which is a base
URL for a service point that supports THUMP.  It is expected, however,
that the Key may generalize to service points in client-server computation
contexts other than today's WWW.
</t>

<t>
The Key uses a "citation-centered" system of reference.  This means
that data elements are addressed relative to an abstract object surrogate,
or "citation".
<!--
.\"To reference data elements of a document (or other item)
.\"linked to from a citation element, the target of the link element is first
.\"obtained with the "get" command in order that subsequent references 
.\"would be relative to the linked item.
-->
</t>

<t>
While some systems have stored metadata-based surrogates (e.g., library
catalog records for books), many other systems do not.  This is not an
obstacle to using THUMP.  The latter usually support the display or
delivery of dynamically generated object citations, each consisting of
such things as an access URL, a size, a date, a title, a snippet of
relevant text (e.g., matching a query), plus links to related materials.
</t>

<t>
Non-surrogate information objects in this model are, loosely speaking, the
priority objects for end users, and include documents, articles, books,
films, recordings, etc.  Surrogates, whether static or dynamically
generated, are important temporary stand-ins during discovery, filtering,
and selection processes.  They are easy to manipulate in large numbers
because they are much more homogeneous than the objects they represent.
Those objects are often too large, unwieldy, or rights-encumbered to be
dealt with directly during discovery.  Surrogates are also valuable in
preservation since they can provide useful information about the original
context, dependencies, and provenance of an object.
</t>

</section>

<section title="Key-Request Dualism">

<t>
Although THUMP does not specify anything about the structure of the Key,
it is possible for a given Key string to express, often in an ad hoc
manner, information similar to that expressed in the Request query string.
The more intuitive the Key structure, the greater the chance for it to
carry information that might appear to repeat or even contradict commands
in the Request.  For example, one server might require

<figure>
 <artwork>
    http://example.org/?in(books)find(war and peace)show(full)
 </artwork>
</figure>

while another server required

<figure>
 <artwork>
    http://example.com/in=books/find=war+and+peace?show(full)
 </artwork>
</figure>

and a third server required

<figure>
 <artwork>
    http://example.net/books/full/war_and_peace
 </artwork>
</figure>

There is a natural dualism that servers may exploit by permitting or
proposing (e.g., by returning) such semantically-laden Keys.  Any
conventions for re-expressing THUMP commands within the Key or for
resolving apparent contradictions, however, are up to individual servers
and are out of scope for this document.
</t>

<t>
This document recognizes the dualism but does not constrain it except to
say that for a given Key, a server that declares THUMP support MUST
respond to the "help" command by listing all the commands (methods) valid
for that Key.  As a foundation requirement, the "help" command is a
common way to ping a THUMP server to see if it is alive.  As an edge
case, a THUMP response might be returned even for a URL that has no
request at all (not even a `?'); this might make sense, for example, when
the URL serves as the base Key for an entire service.
</t>

<t>
There are cases when a server may wish to generate a temporary Key as a
stand-in for a long or complex request and return it along with a subset
of found records.  For example, the request,

<figure>
 <artwork>
    http://example.com/?in(books)find(war and peace)list(10|1)
 </artwork>
</figure>

might return the first 10 records along with a Key that could be used in
subsequent requests to return the next 10 records:

<figure>
 <artwork>
    http://example.com/req98765?list(10|11)
 </artwork>
</figure>

Note that this document makes no assumption about the dynamicity
of queries, whether expressed partially or entirely in the Key or in
the request.  In either form, returned records might come from cached
results or from results freshly computed upon each access.  THUMP
support does not constrain servers in this regard.
</t>

</section>

<section title="Request Summary">

<t>
There are several request forms described below, with output formats
listed in a later section.  Spaces have been inserted for readability
in the forms below; usually, inter-command spaces would not be present.
It is normal to formulate THUMP queries using only a subset of the
commands specified.  With a few important exceptions, this document
is silent on how servers supply defaults or whether they signal errors
for missing commands.  All default actions and server-side request
modifications SHOULD be reported back to the client.
</t>

<section title="Key ? help">

<t>
This form is required.  A server that declares THUMP support MUST
respond to the "help" command by listing all the commands (methods)
valid for that Key.  As a foundation requirement, the "help" command
is a common way to ping a THUMP server to see if it is alive.
</t>

</section>

<section title="Key ? was(DESCRIPTION) when(DATE) resync">

<t>
This "metadata" command form provides nothing more than a way to carry a
Key along with its description.
The form is a "no-op" (except when "resync" is present) in
the sense that the Key is treated as an adorned URL (as if no THUMP request
were present).  This form is designed as a passive data structrue that
pairs a hyperlink with its metadata so that a formatted description might
be surfaced by a client-side trigger event such as a "mouse-over".  It is
passive in the sense that selecting ("clicking on") the URL should result
in ordinary access via the Key-as-pure-link as if no THUMP request were
present.  The form is effectively a metadata cache, and the DATE of last
extraction tells how fresh it is.
</t>

<t>
The "was" pseudo-command takes multiple arguments separated by "|", the
first argument identifying the kind of DESCRIPTION that follows, e.g,

<figure>
 <artwork>
was(erc|Tolstoy, L|War and Peace|1863|http://example.org/etext/2600)
 </artwork>
</figure>

The "when" pseudo-command (optional) takes one argument that is the date
that the immediately DESCRIPTION was extracted.  The date, conforming to
the [TEMPER] specification, looks like YYYYMMDDhhmmss.  The "was" and
"when" pseudo-commands can harmlessly accompany any THUMP request.
</t>

<t>
The "resync" command, however, is a request to update the metadata.
It returns a "metadata" form similar to the one submitted, but with
refreshed metadata and no "resync" at the end.
</t>

</section>

<section title="Key ? in(DB) find(QUERY) sort([!]ELEMS) list(RANGE) show(ELEMS)
as(FORMAT)">

<t>
This form is used for generalized queries.  The server is permitted to
modify commands, such as by supplying missing commands (defaults),
but SHOULD report the resulting filled-out command xxx.
</t>

<t>
<spanx style="strong">in(DB)</spanx><vspace/>
The "in" command specifies one or more dataset names separated by "|".
If no "in" command is present, the server picks a suitable default
dataset or returns an error.  If no other commands are present, the
server may treat the dataset as a result set or return an error.
Dataset names originating in relational databases are assumed to name
a table in a default database, but may be structured into database,
schema, and table names using the reserved characters '/' and '.' as per
the following forms:

<figure>
 <artwork>
    database/schema.table
    database/table
    schema.table
    table
 </artwork>
</figure>
</t>

<t>
<spanx style="strong">find(QUERY)</spanx><vspace/>
The "find" command specifies a QUERY that should produce a result set
of matching records or an error.  The result set is modeled as a
numbered sequence of records that is returned "by reference" with a
generated Key (see the "results" tag later) or as one or more returned
subsequences of records, known as returned sets.  If no "find" command
is present, Key is expected to imply either a single record or a set of
records.  THUMP distinguishes between a result set and a returned set,
which is a subsequence of the result set included in a given response.
</t>

<t>
The QUERY consists of free text words separated by spaces.  Reserved
words begin with a ":" (colon), such as the :and, :or, and :not
boolean operators.  Parentheses can be used for grouping.  Prepending
"+" ("-") to a word is done when the requester desires that the word be
present (absent) from search results.  The double-quote character can be
used to join words in a phrase or to turn off the special meanings of
parentheses or ":+-" in front of words.
</t>

<t>
<spanx style="strong">sort([!]ELEMS)</spanx><vspace/>
The "sort" command is used to request ordering according to the ELEMS
specification (descending order if preceded by '!').  If no "sort"
command is present, it is up to the server to determine record ordering.
ELEMS is one or more element or element subset names separated by "|".
</t>

<t>
<spanx style="strong">list(RANGE)</spanx><vspace/>
The "list" command is used to request that a specific subsequence or
RANGE of records be returned.  The server should always use the starting
point of the requested RANGE, but is free to return fewer records (or a
partial record).  In all cases the server must report what records or
record fragment it has returned.  If no "list" command is present, it is
up to the server whether to return records, and if so, which records.
</t>

<t>
RANGE is a pair of arguments, "LENGTH|START", indicating the number of
records and starting record in the requested sequence.  For example, a
RANGE of "10|81" requests 10 records beginning with result set record 81.
If both arguments are missing, as in "list()", it is considered a request
for all records.  If given as just "list(0)", it is a request that no
records be returned directly, but a that the result set be returned by
reference to a generated Key listed in the "results" tag of the returned
set header.  If LENGTH is positive and START is 0, the server should send
LENGTH randomly selected result set records.  If START is missing it
defaults to 1; if LENGTH is missing, it is considered a request for all
records starting from START.
</t>

<t>
RANGE may also be used to request record fragments.  A returned record set
consists of either one or more entire (whole) records, or of exactly one
fragment of one record.  When a fragment is returned, the start position
in the set header (described later) is indicated with S_F, where S is the
record number and F is the fragment sequence number.  To request the next
fragment, a START is formulated by adding 1 to F.  For example, "10|45_3"
requests 10 records starting at fragment 3 of record 45 (only one fragment
can be returned).
</t>

<!--
.\"XXX
.\"If START is non-numeric, it is a "resumption token" that the
.\"server returned in a previous response; the client mirrors it back to the
.\"server to request either the next record fragment (when a record is too
.\"large to return in one response) or the next group of records (when it is
.\"inconvenient to track which records have been returned thus far).
.\"
.\"The "get" command is used to establish a new reference origin by
.\"following the LINK element from the current reference system.  Servers
.\"need not support the "get" command at all, or may support only one or a
.\"few LINK elements.  Servers should return an error for unsupported "get"
.\"requests.  If no "get" command is present, all element and element subset
.\"names reference the object surrogate.  Common arguments for "get" are
.\""thumbnail", "master", "preferred", and "toc".  XXX no arg?  what about
.\"requesting the object itself?? xxx
-->

<t>
<spanx style="strong">show(ELEMS)</spanx><vspace/>
The "show" command is used to request that returned records be constituted
with ELEMS elements.  ELEMS is one or more element or element subset
names separated by "|".  It can be used by users to define the
composition and element order of a returned record set; element names
are discovered by XXX.
</t>

<t>
Element subset names can also be used.  Common subset names are "brief",
"full", and "support" (a record that is complete enough to show the
server's commitment to the object.  If no "show" command is present, it
is up to the server which elements to return.
</t>

<t>
<spanx style="strong">as(FORMAT)</spanx><vspace/>
The "as" command is used to request that returned records be formatted
according to FORMAT.  Common format names are "anvl/erc", "anvl/qdc",
and "xml/marc".  If no "as" command is present, the default format is
usually "anvl/erc" (a plain text format that is eye-readable and
machine-readable), although a service may define defaults in its own way.
</t>

</section>

<section title="Key ?">

<t>
This is a shorthand for

<figure>
 <artwork>
    Key ? show(brief) as(anvl/erc)
 </artwork>
</figure>

which returns a brief object (identified by Key) description.  Support for
this shorthand is required.
</t>

</section>

<section title="Key ??">

<t>
This is a shorthand for

<figure>
 <artwork>
    Key ? show(support) as(anvl/erc)
 </artwork>
</figure>

which returns an object description full enough to contain the server
provider's commitment statement.  Support for this shorthand is required.
</t>

<!--
.\".Nh 2 "Key"
.\"Key
.\"
.\"For semantic completeness, THUMP defines the degenerate case (no query
.\"string or `?' at all) to be equivalent to
.\".Cs
.\"	Key ? get()
.\".Ce
.\"which is a way within THUMP explicitly to reference the same result that
.\"the server would produce were it to respond to the Key with no query
.\"string appended.
.\"
-->
</section>

<section title="Key ? get() put() group() apply()">

<t>
These commands are currently undefined and reserved by THUMP for future use.
</t>

</section>

</section>

<section title="Response Summary">

<t>
A THUMP response consists of a block of HTTP and extension headers, a
blank line, and, if the THUMP-Status extension header was 200, a returned
set of records.  The Content-Type HTTP header is normally returned as

<figure>
 <artwork>
    Content-Type: text/plain
 </artwork>
</figure>

so that the results will display correctly on a web browser's display.
The THUMP content types "text/xml" and "text/html" are being considered.
</t>

<t>
The rest of this section describes the THUMP extension headers and the
structure of the returned record set.  Extension headers are inserted
in the block of HTTP response headers, usually near the end.  Currently,
one extension header, THUMP-Status, is defined, and it is required:

<figure>
 <artwork>
    THUMP-Status: THUMPVersion StatusCode ReasonPhrase
 </artwork>
</figure>

It includes the version, a short human-readable phrase, and a 3-digit
integer result code indicating the status of the attempt to execute the
request.  Defined StatusCodes and ReasonPhrases for THUMP are:

<figure>
 <artwork>
    200: OK
    400: Bad Request
    402: Payment Required
    403: Forbidden
    404: Not Found
    405: Method Not Allowed
    408: Request Time-out
 </artwork>
</figure>

If the status code is other than 200, no record set should be sent.  If the
server wishes to convey any more detailed diagnostic or error information
than may be expressed by the above status codes, it MUST set the code to
200 and use "error" or "warning" element tags within the returned record set.
<!--
.\"XXX Should there also be a THUMP-Type to hold "anvl/erc"?
-->
</t>

<t>
A blank line separates the HTTP response and THUMP-Status headers from the
returned set that is the body of the response.  The returned record set
consists of a set-start header record followed by a sequence of records,
each separated by one ore more blank lines, until end of stream (file) is
reached.  A set-end header record is optional.
</t>

<t>
The format of the records is normally "anvl/erc", which specifies a
serialization syntax [ANVL] with ERC semantics <xref target="Kernel"/>.
In a future
version of THUMP it will be possible to request ERC semantics with
"xml/erc".  The next sections describe the special ANVL record used
to introduce a record set and then the ERC records.
</t>

</section>

<section title="Returned Records">

<t>
This section describes how a record in the sequence of returned records
is encoded in the anvl/erc format.  ANVL (A Name Value Language) defines
the syntax and the ERC (Electronic Resource Citation) defines semantics.
The URI for the ERC <xref target="Kernel"/> reference should be included
in the record set
header.  While a comprehensive description of the ERC record is out of
scope for this document, some details are give below that may suffice
for simple implementations.
</t>

<t>
An ERC record is a sequence of tagged elements.  It has the form,

<figure>
 <artwork><![CDATA[
    erc:
    who:   WHO_EXPRESSED_THIS_ITEM
    what:  WHAT_THE_EXPRESSION_WAS_CALLED
    when:  WHEN_IT_WAS_EXPRESSED
    where: WHERE_THE_EXPRESSION_CAN_BE_FOUND
    how:   DESCRIPTION_OR_SUMMARY_OF_ITEM              <optional>
    why:   COPYRIGHT_DISCLAIMER_AUDIENCE_STATEMENT     <optional>
    note:  ANY_TEXT                                    <optional>
           .......
    <any other tagged elements>                        <optional>
 ]]></artwork>
</figure>

The first five tagged elements are required.  The required elements may be
thought to answer questions about an "expression" of a resource (an item).
</t>

<t>
All other elements are optional.  The next ERC element shown above ("how")
is concerned with the content of an item and the element after that ("why")
with any high priority information that comes from the lawyerly domain --
the really hard questions.
</t>

<t>
A short form of the ERC is also possible that the above ordering for the
first 6 elements.  It has the form,

<figure>
 <artwork><![CDATA[
    erc: WHO | WHAT | WHEN
         | WHERE
         | HOW                                         <optional>
         | WHY                                         <optional>
    note:  ANY_TEXT                                    <optional>
           .......
    <any other tagged elements>                        <optional>
 ]]></artwork>
</figure>

The line breaks among the first 6 elements are arbitrary.  Together they
are considered to be part of one long value for the "erc:" as long as they
are continued on indented lines.  In either form of the ERC, arbitrary
additional elements are possible.
</t>

<section title="Empty values for required elements">

<t>
Although they are required, if no suitable element value can be found, a
controlled code value for "empty" of the form

<figure>
 <artwork>
    (:ccode)
 </artwork>
</figure>

should be used, drawing from the following reserved values:
</t>

<t>
<list style="hanging">
<t>(:unac)   temporarily inaccessible</t>
<t>(:unal)   unallowed, suppressed intentionally</t>
<t>(:unap)   not applicable, makes no sense</t>
<t>(:unas)   value unassigned (e.g., Untitled)</t>
<t>(:unav)   value unavailable, possibly unknown</t>
<t>(:unkn)   known to be unknown (e.g., Anonymous, Inconnue)</t>
<t>(:none)   never had a value, never will</t>
<t>(:null)   explicitly and meaningfully empty</t>
<t>(:tba)    to be assigned or announced later</t>
<t>  </t>
<t>(:etal)   too numerous to list (et alia).</t>
<t>(:at)     the real value is at the given URL or identifier.</t>
</list>
</t>

</section>

</section>

<section title="FAQ -- Frequently Asked Questions">

<section title="What's the difference between THUMP, OpenSearch, SRU/SRW,
and OpenURL?">

<t>
All of these protocols are capable of expressing a parameter package
on the right-hand side of a URL, and all of them reserve specific
parameter names as having defined meanings.  In theory, these packages
can be extended arbitrarily to express any functionality with any level
of complexity.  There's no syntactic limitation to these protocols'
expressiveness.  The difference lies in how.
</t>

<t>
THUMP uses a classic parenthesized argument list syntax while the others
use the flat argument-value list syntax traditional on the web since 1995.
OpenSearch and SRU/SRW are logical descendants of the complex Z39.50
search and retrieve protocol, but with restricted functionality and a
text-based syntax.  SRW and OpenURL define an XML-encoding for request
parameters.  OpenURL tends to be used for known-item linking.  THUMP
aims to be a more concise specification for key-based requests.
</t>

</section>

</section>

<!--
<section title="Appendix - Motivation for Electronic Resource Citations
(ERCs)">

<t>
An Electronic Resource Citation (or ERC, pronounced e-r-c) is a
simple, compact, and printable record designed to hold data associated
with an information resource.  By design, the ERC is a metadata format
that balances the needs for expressive power, very simple machine
processing, and direct human manipulation.
</t>

<t>
A founding principle of the ERC is that direct human contact with metadata
will be a necessary and sufficient condition for the near term rapid
development of metadata standards, systems, and services.  Thus the
machine-processable ERC format must only minimally strain people's ability
to read, understand, change, and transmit ERCs without their relying on
intermediation with specialized software tools.  The basic ERC needs to be
succinct, transparent, and trivially parseable by software.
</t>

<t>
Borrowing from the data structuring format that underlies the successful
spread of email and web services, the ERC format uses
<xref target="ANVL"/>,
which is based on email and HTTP headers
<xref target="RFC2822"/>.
There is a naturalness to
ANVL's label-colon-value format (seen in the previous section) that
barely needs explanation to a person beginning to enter ERC metadata.
</t>

<t>
Besides simplicity of ERC system implementation and data entry mechanics,
ERC semantics (what the record and its constituent parts mean) must also
be easy to explain.  ERC semantics are based on a reformulation and
extension of the Dublin Core (DC) <xref target="RFC5013"/>
hypothesis, which suggests that the
fifteen Dublin Core metadata elements have a key role to play in
cross-domain resource description.  The ERC design recognizes that the
Dublin Core's primary contribution is the international,
interdisciplinary consensus that identified fifteen semantic buckets
(element categories), regardless of how they are labeled.  The ERC then
adds a definition for a record and some minimal compliance rules.  In
pursuing the limits of simplicity, the ERC design combines and relabels
some Dublin Core buckets to isolate a tiny kernel (subset) of four
elements for basic cross-domain resource description.
</t>

<t>
For the cross-domain kernel, the ERC uses the four basic elements -
who, what, when, and where - to pretend that every object in the universe
can have a uniform minimal description.  Each has a name or other
identifier, a location, some responsible person or party, and a date.
It doesn't matter what type of object it is, or whether one plans to read
it, interact with it, smoke it, wear it, or navigate it.  Of course, this
approach is flawed because uniformity of description for some object
types requires more semantic contortion and sacrifice than for others.
This approach is suited to objects that accommodate reasonably regular
electronic description.
</t>

<t>
While insisting on uniformity at the most basic level provides powerful
cross-domain leverage, the semantic sacrifice is great for many
applications.  So the ERC also permits a semantically rich and nuanced
description to co-exist in a record along with a basic description.
In that way both sophisticated and naive recipients of the record can
extract the level of meaning from it that best suits their needs and
abilities.  Key to unlocking the richer description is a controlled
vocabulary of ERC record types (not explained in this document) that
permit knowledgeable recipients to apply defined sets of additional
assumptions to the record.
</t>

<section title="ERC Syntax">

<t>
An ERC record is a sequence of metadata elements ending in a blank line.
An element consists of a label, a colon, and an optional value.  Here is
an example of a record with five elements.

<figure>
 <artwork>
    erc:
    who: Gibbon, Edward
    what: The Decline and Fall of the Roman Empire
    when: 1781
    where: http://www.ccel.org/g/gibbon/decline/
 </artwork>
</figure>

A long value may be folded (continued) onto the next line by inserting
a newline and indenting the next line.  A value can be thus folded across
multiple lines.  Here are two example elements, each folded across four
lines.

<figure>
 <artwork>
    who/created: University of California, San Francisco, AIDS
         Program at San Francisco General Hospital | University
         of California, San Francisco, Center for AIDS Prevention
         Studies
    what/Topic:
          Heart Attack | Heart Failure
         | Heart
                          Diseases
 </artwork>
</figure>

An element value folded across several lines is treated as if the
lines were joined together on one long line.  For example, the second
element from the previous example is considered equivalent to 

<figure>
 <artwork>
    what/Topic: Heart Attack | Heart Failure | Heart Diseases
 </artwork>
</figure>

An element value may contain multiple values, each one separated from the
next by a `|' (pipe) character.  The element from the previous example
contains three values.
</t>

<t>
For annotation purposes, any line beginning with
a `#' (hash) character is treated as if it were not present; this is a
"comment" line (a feature not available in email or HTTP headers).  For
example, the following element is spread across four lines
and contains two values:

<figure>
 <artwork>
    what/Topic:
         Heart Attack
    #    | Heart Failure  - hold off until next review cycle
         | Heart Diseases
 </artwork>
</figure>

</t>

</section>

</section>
-->

<section title="Security Considerations">

<t>
The THUMP protocol poses no direct risk to computers and networks.
Implementors of THUMP services need to be aware of security issues when
querying networks and filesystems, and the concomitant risks from spoofing
and obtaining incorrect information.  These risks are no greater for THUMP
than for any other kind of HTTP-based application.  For example,
recipients of a URL with embedded THUMP commands should treat it
like a URL and be aware that the identified service may no longer
be operational.
</t>

<t>
THUMP clients and servers subject themselves to all the risks that
accompany normal operation of the protocols underlying mapping services
(e.g., HTTP, Z39.50).  As specializations of such protocols, a THUMP
service may limit exposure to the usual risks.  Indeed, THUMP services
may enhance a kind of security by helping users identify long-term
reliable references to information objects.
</t>

</section>

 </middle>

 <back>

   <references>

    <reference anchor="ANVL"
      target="http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf">
     <front>
      <title>A Name-Value Language</title>
      <author initials="J." surname="Kunze" fullname="John A. Kunze" />
      <author initials="B." surname="Kahle" fullname="Brewster Kahle" />
      <author initials="J." surname="Masanes" fullname="Julien Masanes" />
      <author initials="G." surname="Mohr" fullname="Gordon Mohr" />
      <date month="August" year="2005" />
     </front>
     <format type="PDF"
       target="http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf" />
    </reference>

    <reference anchor="ARK"
      target="http://www.cdlib.org/inside/diglib/ark/arkspec.pdf">
     <front>
      <title>The ARK Persistent Identifier Scheme</title>
      <author initials="J." surname="Kunze" fullname="John Kunze" />
      <author initials="R." surname="Rodgers" fullname="R.P.C. Rodgers" />
      <date month="July" year="2007" />
     </front>
     <format type="PDF"
       target="http://www.cdlib.org/inside/diglib/ark/arkspec.pdf" />
    </reference>

    <reference anchor="Kernel"
      target="http://www.cdlib.org/inside/diglib/ark/ercspec.html">
     <front>
      <title>Kernel Metadata and Electronic Resource Citations (ERCs)</title>
      <author initials="J." surname="Kunze" fullname="John Kunze" />
      <author initials="A." surname="Turner" fullname="Adrian Turner" />
      <date month="October" year="2007" />
     </front>
     <format type="HTML"
       target="http://www.cdlib.org/inside/diglib/ark/ercspec.html" />
    </reference>

    &RFC2822; <!-- email headers -->
    &RFC3629; <!-- UTF-8 -->
    &RFC5013; <!-- Dublin Core -->

  </references>

 </back>

</rfc>
