Previous Next Table of Contents

1. History of XML::Edifact

My first contact with UN/EDIFACT was based on a source code exchange with USR-Tuebingen. I've showed them how to use a Linux box to access the Verzeichniss Lieferbarer Buecher CDROM from other Unix sites. They gave me a tool called ediview, an interactive UN/EDIFACT browser written in C and used for the Frankfurt EDITEUR project. The parser was table driven, and to my horror they told me that they've retyped the printed EDITEUR draft for those tables.

My first attempt to reengineer this C source started with :

        sed -e 's/??/^A/g' \
            -e 's/?+/^B/g' \
            -e 's/?'"'"'/^C/g' |
        tr "^A^B^C+'" "?+'\t\n"

This gave me some tabular view on UN/EDIFACT messages intended to be loaded into a Postgres database, or viewed with less.

Soon thereafter I found the UN/EDIFACT batch directory at Premenos and wrote a 200 lines GAWK script to translate EDIFACT messages into a human readable form looking like:

        LINE ITEM NUMBER               : 1
        Product identification         : 0471949000 ISBN
        Name                           : Cherry
        Vorname                        : Gordon E
        Titel                          : Birmingham
        Untertitel                     : a study in geography, hislanning
        Ort                            : Chichester
        Verlag                         : Wiley(John)(W Sussex)
        Erscheinungsjahr               : 1994
        Seiten                         : 254p
        Ausstattung                    : ?  ill ; 24cm. - Bibl.?  P.237-244.
        Subject (topical)              : 39100200? Urban studies
        Ordered quantity               : 1
        Suggested retail price         : YYY 37.5 Catalogue
        Reference qualifier            : QNB 00023302 9
        Reference date/time            : 19960208 CCYYMMDD
        Line item reference number     : 8217
        Reference qualifier            : BFN S.KON.39

You may note the mixture of German and English translations, as the EDITEUR codelist extension I had, had been the German ones typed by USR.

The EDITEUR project stopped. IBU and others continued to use their home grown format, together with horror full MS-DOS applications, for book order routing.

I've started to think about SGML for a report system, when I found Martin Bryan's homepage about XML/EDI. The first Edi2SGML was written within a night shift, and I was able to process EDIFACT messages using nsgmls or Jade. Edi2SGML was written in Perl and produced:

<!-- *** LIN+1 -->
<line.item>
  <line.item.number>1</line.item.number>
</line.item>
<!-- *** PIA+5+0471949000:IB -->
<additional.product.id>
  <product.id.function.qualifier coded="5">Product identification</product.id.function.qualifier>
  <item.number.identification>
    <item.number>0471949000</item.number>
    <item.number.type coded="IB">ISBN (International Standard Book Number)</item.number.type>
  </item.number.identification>
</additional.product.id>
<!-- *** IMD+F+010+:::Cherry -->
<item.description>
  <item.description.type coded="F">Free-form</item.description.type>
  <item.characteristic coded="010">Author Name<item.characteristic>
Cherry
</item.description>

This SGML and the later XML from XML::Edifact-0.2 had a real problem with name clashes between segment, composite and element definitions in the original UN/EDIFACT batch directory, causing trouble when it came to validating the SGML/XML. As an example, take a look at the composite definition file trcd :

      C080  PARTY NAME

      Desc: Identification of a transaction party by name, one to five
            lines. Party name may be formatted.

010   3036   Party name                                    M  an..35
020   3036   Party name                                    C  an..35
030   3036   Party name                                    C  an..35
040   3036   Party name                                    C  an..35
050   3036   Party name                                    C  an..35
060   3045   Party name format, coded                      C  an..3

Here we have a composite called PARTY NAME and elements also called Party name. The first idea of using case sensitivity of XML to distinct between them, lost its glance when it came to the PNA segment, which is also called PARTY NAME. But XML offers namespaces for situations like this, so a possible XML::Edifact translation of the above EDITEUR book order line item is :

<?xml version="1.0"?>
<!DOCTYPE editeur:message SYSTEM "./editeur.dtd">
<!-- XML message produced by edi2xml.pl (c) Kraehe@Bakunin.North.De -->

<editeur:message
        xmlns:editeur='./editeur.rdf'
        xmlns:edifact='./edifact.rdf' 
        xmlns:trsd='./edifact_trsd.rdf'
        xmlns:trcd='./edifact_trcd.rdf'
        xmlns:tred='./edifact_tred.rdf'
        xmlns:uncl='./edifact_uncl.rdf'
        xmlns:anxs='./edifact_anxe.rdf'
        xmlns:anxc='./edifact_anxc.rdf'
        xmlns:anxe='./edifact_anxe.rdf'
        xmlns:unsl='./edifact_unsl.rdf'
        >

<!-- SEGMENT UNB+UNOC:2+STUB+BLA+960209:0843+72 -->

  <anxs:interchange.header>
    <anxc:syntax.identifier>
      <anxe:syntax.identifier unsl:code="0001:UNOC">UN/ECE level C</anxe:syntax.identifier>
      <anxe:syntax.version.number>2</anxe:syntax.version.number>
    </anxc:syntax.identifier>
    <anxc:interchange.sender>
      <anxe:sender.identification>STUB</anxe:sender.identification>
    </anxc:interchange.sender>
    <anxc:interchange.recipient>
      <anxe:recipient.identification>BLA</anxe:recipient.identification>
    </anxc:interchange.recipient>
    <anxc:date.time.of.preparation>
      <anxe:date>960209</anxe:date>
      <anxe:time>0843</anxe:time>
    </anxc:date.time.of.preparation>
      <anxe:interchange.control.reference>72</anxe:interchange.control.reference>
  </anxs:interchange.header>

<!-- ... lot's of segments deleted ... -->

<!-- SEGMENT LIN+1 -->

  <trsd:line.item>
      <tred:line.item.number>1</tred:line.item.number>
  </trsd:line.item>

<!-- SEGMENT PIA+5+0471949000:IB -->

  <trsd:additional.product.id>
      <tred:product.id.function.qualifier uncl:code="4347:5">Product identification</tred:product.id.function.qualifier>
    <trcd:item.number.identification>
      <tred:item.number>0471949000</tred:item.number>
      <tred:item.number.type.coded uncl:code="7143:IB">ISBN (International Standard Book Number)</tred:item.number.type.coded>
    </trcd:item.number.identification>
  </trsd:additional.product.id>

<!-- SEGMENT IMD+F+010+:::Cherry -->

  <editeur:item.description>
      <tred:item.description.type.coded uncl:code="7077:F">Free-form</tred:item.description.type.coded>
      <editeur:item.characteristic.coded editeur:code="7081:010">Author Name</editeur:item.characteristic.coded>
    <trcd:item.description>
      <tred:item.description>Cherry</tred:item.description>
    </trcd:item.description>
  </editeur:item.description>

Using namespaces not only allows to define a working DTD for plain EDIFACT, it also offers a nice way to translate code list extensions as in the above EDITEUR example.

In the above example each xmlns is referencing a RDF file as its URI. Those files do not yet exist, but are proposed to the XML::Edifact-0.5 version.


Previous Next Table of Contents