Using GXML: Other Features

by Josh Carter < josh@multipart-mixed.com >
Feb 24, 2002

Overview

This page covers most of "those other features" in GXML. I put the features common to GXML and gxml2html at the top. I've also tried to rank them in rough order of usefulness, but since I have no idea what your particular needs may be, you should at least scan the headings on this page to see if you find something interesting.

HTML Mode

NOTE: this is always on in gxml2html

GXML was originally used to convert XML to HTML, but later I decided it was useful for other generic XML transformations. Thus, in GXML the "convert to HTML" mode is off by default, but you can easily flip it on. HTML mode affects the output as follows:

As always, an example to demonstrate:

    # source XML:
    
    <!-- Let's say "photo" has a template -->
    <photo>
      <source>dog.jpg</source>
      <caption>Ugly Dog</caption>
    </photo>
    
    <!-- here's a single-tag entity, in XML syntax -->
    <hr/>
    
    # output with HTML mode on:
    
    <!-- (note that "photo" tag got stripped) -->
    <p>
      <img src="dog.jpg"><br>
      Ugly Dog
    </p>
    
    <!-- here's a single-tag entity, (now in HTML) syntax -->
    <hr>

Tag Remappings

Say you want to map <heading> in your source file to <h2> in the output (i.e. a standard HTML heading). You could do this via a trivial template, but you can do it easier via a remappings hash passed into GXML, or a gxml2html config file. First, for those using the module directly, here's the format:

    my %remappings = ( 'heading' => 'h2',
                       'subhead' => 'h3' );

    my $gxml = new XML::GXML({'remappings' => \%remappings});

The hash format is simply the source tag as key, remapped tag as value. You can provide as many remappings as you want. Now, if you're using gxml2html, you can do the following in your .gxml2html config file:

    # sample remaps in .gxml2html
    <heading>    <h2>
    <subhead>    <h3>

Note that remappings occur before template substitution, so if you provide both a template and a remapping for a given tag, the remap will run first and the template will therefore not be applied. Also note that if you remap a tag to something that does have a template -- e.g. if you had templates for h2 or h3 in the above code -- the template for the new tag will be applied. Finally, you can set the replacement to nothing, in which case the tag (and any enclosed content) is dropped.

Attribute Collection

NOTE: All features from here down are only accessible from the GXML module; gxml2html does not have an interface for them.

I'm not sure if this feature is brilliant or totally gratuitous. (Maybe both?) There's a stripped-down class in GXML which just collects attributes using GXML's attribute engine. Given how XML allows you have two different styles of "attributes," i.e. real ones in entity start tags and "logical" ones (for lack of a better term) which are sub-entities, I figured people may find a GXML-based attribute collector useful. If nothing else, I've found it useful.

The concept is this: you specify the kind of entity you're looking for (i.e. its tag), and the attributes you want to collect. GXML's AttributeCollector class returns a hash containing everything it found. The hash keys are an attribute you specify. For example, let's say you wanted to find all the "person" entities in the following XML file, get their quest and favorite color, and key the result hash by their name:

    <person>
      <name>Bob</name>
      <age>42</age>
      <quest>turn the moon into green cheese</quest>
      <color>blue</color>
    </person>
    
    <person name="Ned">
      <quest>make lots of money</quest>
      <color>purple</color>
      <haircolor>brown</haircolor>
    </person>
    
    <container quest-default="eat potatoes">
      <person color="red">
      <name>Fred</name>
      </person>
    </container>

The source code for using the collector would be:

    use XML::GXML;

    # create a collector    
    my $collector = new XML::GXML::AttributeCollector('person', 'name', 
                                                     ['quest', 'color']);

    # collect from a scalar
    $collector->Collect($your-xml-here);

    # ...or from a file
    $collector->Clear(); # clear old data first
    $collector->CollectFromFile('your-filename-here.xml');

The parameters, as above, are the entity you're looking for, the key attribute, and a array ref to the other attributes you want. If you want to run the collector over several sources, call Clear() inbetween to clear out the previous data. (Your config settings will be preserved.)

The resulting hash for the above XML and code would contain:

    Bob =>
      quest => turn the moon into green cheese
      color => blue
    Ned =>
      quest => make lots of money
      color => purple
    Fred =>
      quest => eat potatoes
      color => red

As demonstrated, all features of GXML's attribute engine are supported. Both styles of attributes can be used in the same source, and you can use default attributes. Note that templates will not be applied by AttributeCollector. This class was designed to run fast, so it doesn't use the template engine. If you need to do this, I'd recommend using a callback on the end tag of the entity you're looking for, then running GXML and calling Attribute() in the callback.

Callbacks

Callbacks allow you to run code at the start or end tag of any entity you specify. The end tag callbacks also allow you to control some aspects of how GXML handles the entity. Let's start with a simple example:

    my %callbacks = ( 'start:thing' => \&thingStart,
                      'end:thing'   => \&thingEnd);
    
    my $gxml = new XML::GXML({'callbacks' => \%callbacks});

The above code would call the subroutines thingStart and thingEnd whenever it saw the respective start and end tags of thing entities. The format for specifying start and end tag is prepending "start:" and "end:" to the name of the tag.

So what are callbacks good for? Start tag callbacks could be used to keep track of entities, e.g. counting the number of a given entity. The end tag callbacks are somewhat more useful, since at that point GXML has parsed the content of the entity, so you can call Attribute() to find attributes. End tag handlers can also do some neat tricks based on their return value.

End tag handlers can return a combination of these values:

Since you can have return multiple commands, you need to return an array reference containing your values, even if you only return one thing in it. For example, let's say you want to add a callback for "thing" entities which only allows three instances of a "thing" in your output. You can easily do this with the following callback attached to "end:thing":

    sub OnlyTakeThree
    {
        if (XML::GXML::NumAttributes('thing') > 3)
        { return ['discard']; }
    }

Callbacks are passed a reference to an empty hash they can store things in if needed. This is only useful if you're using the 'repeat' feature; it lets you save your current iteration state, for example. Again, you're getting into pretty heavy wizardry here, so study GXML's ForEachEnd closely.

Dynamic Attributes (addlAttrs param)

You can specify a callback for variables which GXML can't match. When GXML looks up a variable, it will search upwards through the attribute chain in your XML document, then call this handler if you specified one, and if the variable is still undefined, it will search for the "-default" value.

The syntax is simple: you get passed the variable name, and if you know the value, you return it. For example:

    use XML::GXML;
    
    my $gxml = new XML::GXML({'addlAttrs' => \&Boberize});
    
    [...mainline code here...]
    
    sub Boberize
    {
        my $attr = shift;
    
        if ($attr eq 'name')
        { return 'Bob'; }
    }

Now all %%%name%%% variables will be "Bob" if a name can't be found in the XML source.

Dynamic Templates (addlTemplates param)

Dynamic variables are neat, but you can also go crazy with dynamic templates. These are extremely handy if you have some data that doesn't change, but some data inside that does change. For example, you have a news page on your web site, and the overall layout is always the same, but the headlines are pulled from a database and constantly change. Using a dynamic template for the headlines list is a perfect solution.

To add dynamic templates, simply pass in a hash that maps your dynamic template names to the subroutines that create the content. (The old style way of doing it, from GXML 2.0, is documented later.)

Here's some sample code:

    use XML::GXML;
    
    my %templates = ('headline-list' => \&HeadlineListTemplate);

    my $gxml = new XML::GXML({'addlTemplates' => \%templates});
    
    [...mainline code here...]
    
    sub HeadlineListTemplate
    {
        my $template = '<headlines>'; # start with root tag

        [...get headlines into @headlines array here...]

        # create XML for each headline
        foreach my $headline (@headlines)
        {
            # append headline to $template
            $template .= sprintf('<headline><link>%s</link>' .
                                 '<title>%s</title></headline>',
                                 $headline->{'link'},
                                 $headline->{'title'});
        }
        
        # close root tag of $template
        $template .= '</headlines>';
        
        # always return a scalar reference
        return \$template;
    }

Then of course you can make a normal (static) template for <headline> that formats the name and the link as you like it. Viola! -- now you have a dynamic, database-driven news page.

Old-Style Dynamic Templates

This style of dynamic templates was in GXML 2.0. It's still around in case anyone's already using it. This usually gets ugly if you have more than a couple dynamic templates, so I recommend using the new style (above) instead.

There are two subroutines required: one to check if a given template exists, and another to get it. The check routine should be fast -- it'll get called often -- but the other will only get called once per substitution where there isn't a template already on disk.

Here's some sample code:

    use XML::GXML;
    
    my $gxml = new XML::GXML(
            {'addlTempExists'  => \&CheckAddlTemplate,
             'addlTemplate'    => \&AddlTemplate});
    
    [...mainline code here...]
    
    sub CheckAddlTemplate
    {
        my $name = shift;
    
        if ($name eq 'dyna-template') { return 1; }
        else                          { return 0; }
    }
    
    sub AddlTemplate
    {
        my $name = shift;
    
        if ($name eq 'dyna-template')
            # return value is a reference
            { return \'<p>hello there</p>'; }
        else
            { return undef; }
    }

The theory here is that your AddlTemplate subroutine may be complicated; e.g. printing a bunch of hidden form data in a CGI app, where the data you print changes depending on your source XML. Thus addlTemplate is only called when strictly necessary.

Back to the GXML Guide

Back to the gxml2html Guide


Copyright (c) 2001-2002 Josh Carter