DB822 Data Format ----------------- 1. Introduction and Motivation ------------------------------ DB822 is a data format designed to be easy to maintain manually, and easy to parse from a program as well. It is motivated by the RFC822 format for email messages. For example, headers of an email message may look as follows Received: by mail.cs.dal.ca (Postfix, from userid 580) id 29049B040; Wed, 25 Jan 2006 13:38:18 -0400 (AST) From: "John Smith" Subject: [Dbworld] [CFP] WS: From Wiki To Semantics The main principle is that each line starts with an attribute, e.g., "Received", which ends with a colon ':', followed by a value of this attribute. If a line needs to be extended to the next physical line, it is indicated by having a space or a tab character to be the first character of the next line. One or more empty lines mark the end of a record. In order to make this format a more usable format for a database storage, several additional rules are created, such as line comments, and line continuation with the backslash ("\") character. 2. Rules -------- 2.1 Record Separation Records are separated by one or more empty lines. To prevent some hard-to-catch errors, a line is considered empty even if it contains some whitespace characters, such as ' ' (space), \t (tab), or \r (carriage return). 2.2 Comments The line comments start with optionally some white space and then '#' symbol. The comments must appear only at the beginning of a record, and if a complete record is commented out, then it is a part of a separator; i.e., the record is not counted. 2.3 Line Continuation A record line can be continued in the next line in one of the two ways by starting space (2.3.1) or by ending backslash (2.3.2). These rules can be repeated to have a multi-line record line. 2.3.1 A line can be continued if the next line starts with a space (' ') or tab ('\t'). 2.3.2 A line can be continued if the line ends with a backslash character ('\'). 2.4 Attribute Value Separation Each line of a record is broken into the attribute (or key) part and the value part. The attribute is the string part from the start until the colon character (':'), and to better catch errors it must not be broken across line continuations. The value is the string after the colon character (':'). 3. Extended Rules ----------------- For practical reasons some extended optional rules are used that may make the format more convenient from a user's view, or that may more precisely determine the exact content of attributes and values. The main rules do not specify how to encode for example an attribute that contains a new-line character, or what happens if an attribute starts with a space (is this space ignored or not). This can be handled by additional rules, or additional level of encoding/decoding that translates attributes and values into non-space strings or sequences of strings. ...(extended rules to be added) 4. Implementation ----------------- 4.1 Reading a Database (db8_read) The basic function to read a database is db8_read and we will offer here several Perl implementations. ...(comments to be added) 4.1.1 A Simple Implementation (db8_read_simple) # db8_read_simple - Perl function for reading records in the DB822 format # A very simple implementation # 2000-2017 Vlado Keselj, version 1.4 sub db8_read_simple { my $arg = shift; my $db = []; while ($arg) { if ($arg =~ /^([ \t\r]*(#.*)?\n)+/) { $arg = $'; } last if $arg eq ''; my $record; if ($arg =~ /([ \t\r]*\n){2,}/) { $record = "$`\n"; $arg = $'; } else { $record = $arg; $arg = ''; } my $r = {}; while ($record) { $record =~ /^[ \t]*([^\n:]*?)[ \t]*:/ or die "db8: no attribute"; my $k = $1; $record = $'; while ($record =~ /^(.*)(\\\r?\n|\r?\n[ \t]+)(\S.*)/) { $record = "$1 $3$'" } $record =~ /^[ \t]*(.*?)[ \t\r]*\n/ or die; my $v = $1; $record = $'; $r->{$k} = $v; # no check for duplicate $k! } push @{ $db }, $r; } return $db; } 4.1.2 Anoter Implementation (db8_read) # db8_read - Perl function for reading records in the DB822 format # 2000-2017 Vlado Keselj, version 1.4 sub db8_read { my $arg = shift; if ($arg =~ /^file=/) { my $f = $'; local *F; open(F, $f) or die "cannot open $f:$!"; $arg = join('', ); close(F); } my $db = []; while ($arg) { my $prologue; if ($arg =~ /^([ \t\r]*(#.*)?\n)+/) { $prologue = $&; $arg = $'; } last if $arg eq ''; my $record; if ($arg =~ /([ \t\r]*\n){2,}/) { $record = "$`\n"; $arg = $'; } else { $record = $arg; $arg = ''; } my $r = {}; while ($record) { $record =~ /^[ \t]*([^\n:]*?)[ \t]*:/ or die "db8: no attribute"; my $k = $1; $record = $'; while ($record =~ /^(.*)(\\\r?\n|\r?\n[ \t]+)(\S.*)/) { $record = "$1 $3$'" } $record =~ /^[ \t]*(.*?)[ \t\r]*\n/ or die; my $v = $1; $record = $'; if (exists($r->{$k})) { my $c = 0; while (exists($r->{"$k-$c"})) { ++$c } $k = "$k-$c"; } $r->{$k} = $v; } push @{ $db }, $r; } return $db; } This function will accept a string in DB822 format, but also if the given string argument starts with 'file=...' then it will take the rest of the argument as a file name and read contents from the file. Example. If a string $s has the following contents: id:1 name: J. Public phone: 000-111 id:2 name: Other Name phone: 123-4567 then we can use the following code to interpret it: @a = @{ &read_db($s) }; and, for example the following code: print $a[0]->{id}, " ", $a[0]->{name}, "\n"; will give the following output: 1 J. Public ...(more content to be added)