DB822 Data Format
		  -----------------

1. Introduction and Motivation
------------------------------

DB822 is a data format designed to be easy to maintain manually, and
easy to parse from a program as well.  It is motivated by the RFC822
format for email messages.  For example, headers of an email message
may look as follows

Received: by mail.cs.dal.ca (Postfix, from userid 580)
	id 29049B040; Wed, 25 Jan 2006 13:38:18 -0400 (AST)
From: "John Smith" <jsm@ai.dnlp.ca>
Subject: [Dbworld] [CFP] WS: From Wiki To Semantics

The main principle is that each line starts with an attribute, e.g.,
"Received", which ends with a colon ':', followed by a value of this
attribute.  If a line needs to be extended to the next physical line,
it is indicated by having a space or a tab character to be the first
character of the next line.  One or more empty lines mark the end of a
record.

In order to make this format a more usable format for a database
storage, several additional rules are created, such as line comments,
and line continuation with the backslash ("\") character.

2. Rules
--------

2.1 Record Separation
Records are separated by one or more empty lines.  To prevent some
hard-to-catch errors, a line is considered empty even if it contains
some whitespace characters, such as ' ' (space), \t (tab), or \r
(carriage return).

2.2 Comments
The line comments start with optionally some white space and then '#'
symbol.  The comments must appear only at the beginning of a record,
and if a complete record is commented out, then it is a part of a
separator; i.e., the record is not counted.

2.3 Line Continuation
A record line can be continued in the next line in one of the two ways
by starting space (2.3.1) or by ending backslash (2.3.2).  These rules
can be repeated to have a multi-line record line.
2.3.1 A line can be continued if the next line starts with a space
      (' ') or tab ('\t').
2.3.2 A line can be continued if the line ends with a backslash
      character ('\').

2.4 Attribute Value Separation
Each line of a record is broken into the attribute (or key) part and
the value part.  The attribute is the string part from the start until
the colon character (':'), and to better catch errors it must not be
broken across line continuations.  The value is the string after the
colon character (':').


3. Extended Rules
-----------------

For practical reasons some extended optional rules are used that may
make the format more convenient from a user's view, or that may more
precisely determine the exact content of attributes and values.
The main rules do not specify how to encode for example an attribute
that contains a new-line character, or what happens if an attribute
starts with a space (is this space ignored or not).  This can be
handled by additional rules, or additional level of encoding/decoding
that translates attributes and values into non-space strings or
sequences of strings.

...(extended rules to be added)


4. Implementation
-----------------

4.1 Reading a Database (db8_read)

The basic function to read a database is db8_read and we will offer
here several Perl implementations.

...(comments to be added)

4.1.1 A Simple Implementation (db8_read_simple)

# db8_read_simple - Perl function for reading records in the DB822 format
# A very simple implementation
# 2000-2017 Vlado Keselj, version 1.4
sub db8_read_simple {
  my $arg = shift; my $db = [];
  while ($arg) {
    if ($arg =~ /^([ \t\r]*(#.*)?\n)+/) { $arg = $'; }
    last if $arg eq ''; my $record;
    if ($arg =~ /([ \t\r]*\n){2,}/) { $record = "$`\n"; $arg = $'; }
    else { $record = $arg; $arg = ''; }
    my $r = {};
    while ($record) {
      $record =~ /^[ \t]*([^\n:]*?)[ \t]*:/ or die "db8: no attribute";
      my $k = $1; $record = $';
      while ($record =~ /^(.*)(\\\r?\n|\r?\n[ \t]+)(\S.*)/)
      { $record = "$1 $3$'" }
      $record =~ /^[ \t]*(.*?)[ \t\r]*\n/ or die;
      my $v = $1; $record = $';
      $r->{$k} = $v; # no check for duplicate $k!
    }
    push @{ $db }, $r;
  }
  return $db;
}


4.1.2 Anoter Implementation (db8_read)

# db8_read - Perl function for reading records in the DB822 format
# 2000-2017 Vlado Keselj, version 1.4
sub db8_read {
  my $arg = shift;
  if ($arg =~ /^file=/) {
    my $f = $'; local *F; open(F, $f) or die "cannot open $f:$!";
    $arg = join('', <F>);
    close(F);
  }

  my $db = [];
  while ($arg) {
    my $prologue;
    if ($arg =~ /^([ \t\r]*(#.*)?\n)+/) { $prologue = $&; $arg = $'; }
    last if $arg eq ''; my $record;
    if ($arg =~ /([ \t\r]*\n){2,}/) { $record = "$`\n"; $arg = $'; }
    else { $record = $arg; $arg = ''; }
    my $r = {};
    while ($record) {
      $record =~ /^[ \t]*([^\n:]*?)[ \t]*:/ or die "db8: no attribute";
      my $k = $1; $record = $';
      while ($record =~ /^(.*)(\\\r?\n|\r?\n[ \t]+)(\S.*)/)
      { $record = "$1 $3$'" }
      $record =~ /^[ \t]*(.*?)[ \t\r]*\n/ or die;
      my $v = $1; $record = $';
      if (exists($r->{$k})) {
	my $c = 0;
	while (exists($r->{"$k-$c"})) { ++$c }
	$k = "$k-$c";
      }
      $r->{$k} = $v;
    }
    push @{ $db }, $r;
  }
  return $db;
}

This function will accept a string in DB822 format, but also if the
given string argument starts with 'file=...' then it will take the
rest of the argument as a file name and read contents from the file.

Example.  If a string $s has the following contents:
id:1
name: J. Public
phone: 000-111

id:2
name: Other Name
phone: 123-4567

then we can use the following code to interpret it:
@a = @{ &read_db($s) };
and, for example the following code:
  print $a[0]->{id}, " ", $a[0]->{name}, "\n";
will give the following output:
1 J. Public


...(more content to be added)