| Data Sources
                          Contents or URLs can be searched for using the <contents>,
                          <templates> or <media> tags, which allow
                          you to search a data source (directory, delimiter-separated-values file,
                          database etc.) for a pattern.
                          
                         
                          <contents> and <media> tags can also pick
                          up metadata from metatable files while searching for content or media items,
                          using the metatableattribute. 
                          Currently two data source protocols are defined, file: and svfile: .
                          
                         
                           
                            Attributes Supported By Datasource Tags 
                              
                                src
                              
                                
                                  All datasources require this attribute, which
                                   specifies a protocol and path, in a URL-style syntax:
                                   protocol:path . file: is the default protocol, if none is
                                   specified.
                                  
                                
                                name
                              
                                
                                  This attribute is used to specify the pattern of data,
                                   under this path, which will be converted into content or media items.
                                   The part of the data's location which matches this name pattern will
                                   become the name of the item. Typically, WebMake glob patterns, such as "*.txt" or ".../*.html" are used.
                                  
                                
                                skip
                              
                                
                                  A pattern which should match filenames that should be
                                   skipped. Files that match this pattern will not be included as content
                                   or media items, or as metatables. Glob patterns, again, are
                                   used here.
                                  
                                
                                prefix
                              
                                
                                  The items' names can be further modified by specifying
                                   a prefix and/or suffix; these strings are prepended or
                                   appended to the raw name to make the name the content is given.
                                  
                                
                                suffix
                              
                                
                                  See above.
                                  
                                
                                namesubst
                              
                                
                                  a Perl-formatted s// substitution, which is used to
                                   convert source filenames to content names. See the example under
                                   The File: Protocol, below.
                                  
                                
                                nametr
                              
                                
                                  a Perl tr// translation, which is used to convert
                                   source filenames to content names.
                                  
                                
                                listname
                              
                                
                                  a name of a content item. This content item will be
                                   created, and will contain the names of all content items picked up by
                                   the <contents> or <media> search.
                                  
                                
                                metatable
                              
                                
                                  a search pattern, similar to name above, which
                                   provides filenames from which metadata will be loaded.
                                  
                                 
                              In addition, the attributes supported by the content tag can
                              be specified as attributes to <contents>, including
                              format, up, map, etc.
                              
                             
                              Also, the attributes supported by the <metatable> tag
                              can be used if you've specified a metatable attribute.
                              
                             
                              The content blocks picked up from a <contents> search can
                              also contain meta-data, such as headlines, visibilty dates, workflow approval
                              statuses, etc. by including metadata.
                              
                             
                              
                             
                            The file: Protocol 
                              The file: protocol loads content from a directory; each file is made into one
                              content chunk. The src attribute indicates the source directory, the
                              name attribute indicates the glob pattern that will pick up the
                              content items in question.
                              
                             
                              
                                <contents src="stories" name="*.txt" />
                                
                               
                              The filename of the file will be used as the content chunk's name -- unless
                              you use the namesubst command; see below for details on this.
                              
                             
                              Note that, for efficiency, the files in question are not actually opened until
                              their content chunks are referenced using ${name} or
                              get_content("name").
                              
                             Searching Recursively Through A Directory Tree
                              Normally only the top level of files inside the src directory are added to
                              the content set. However, if the name pattern starts with .../, the
                              directory will be searched recursively:
                              
                             
                              
                                <contents src="stories" name=".../*.txt" />
                                
                               
                              The resulting content items will contain the full path from that directory
                              down, i.e. the file stories/dir1/foo/bar.txt exists, the example above
                              would define a content item called ${dir1/foo/bar.txt}.
                              
                             The namesubst Option
                              If you use the namesubst command, the filename will be modified using that
                              substitution, to give the content item's name. So, for example, this contents
                              tag:
                              
                             
                              
                                <contents src="stories" name="*.txt" namesubst="s/.txt//" />
                                
                               
                              will load these example files as follows:
                              
                             
                               
                                
                                  
                                    | Filename | Content Name |  
                                    | stories/index.txt | ${index} |  
                                    | stories/foo.txt | ${foo} |  
                                    | stories/directory/bar.txt | ${directory/bar} |  
                                    | stories/zz/gum/baz.txt | ${zz/gum/baz} |  Loading Metadata Using the Metatable Attribute
                              You can now load metadata from external files while searching a directory tree
                              for content items or media files. This allows you to load image titles, etc.
                              from files which match the filename pattern you specify in the metatable
                              attribute.
                              
                             
                              The attributes supported by the <metatable> tag can be
                              used in the datasource tag's attribute set, if you've specified a
                              metatable attribute, allowing you to define the format of the
                              metatable files you expect to find.
                              
                             
                              There's one major difference between normal metatables and metatables
                              found via a data source; the names in this kind of metatable refer to
                              the content or media object's filename, not its content name.
                              
                             
                              In other words, the names of any content items referred to in the metatable
                              files will be modified, as follows:
                              
                             
                              
                                
                                  if the name attribute contains .../, then the content items
                                   could be deep in a subdirectory. The metatable file does not have
                                   to contain the full path to the content item's name; it can just
                                   contain the item's filename relative to the metatable itself.
                                
                                  if a namesubst or nametr function is specified, the content
                                   names in the metatable will be processed with this. Again, this
                                   means that the metatable data just has to provide the filename,
                                   not whatever the resulting content item will be called.
                                  
                                 
                              These features will hopefully make the operation a little more intuitive, as
                              users who add files to a media or contents directory will not have to figure
                              out what the resulting content item will be called; they can just refer to
                              them by their filename, when tagging them with metadata.
                              
                             
                              
                             
                            The svfile: Protocol 
                              The svfile: protocol loads content from a delimiter-separated-file; the
                              src attribute is the name of the file, the name is the glob
                              pattern used to catch the relevant content items. The namefield
                              attribute specifies the field number (counting from 1) which the name
                              pattern is matched against, and the valuefield specifies the number of
                              the field from which the content chunk is read. The delimiter
                              attribute specifies the delimiter used to separate values in the file.
                              
                             
                              
                                <contents src="svfile:stories.csv" name="*"
                                 namefield=1 valuefield=2 delimiter="," />
                                
                               
                              
                             
                            Adding New Protocols 
                              New data sources for <contents> and <media> tags are added by
                              writing an implementation of the DataSourceBase.pm module, in the
                              HTML::WebMake::DataSources package space (the
                              lib/HTML/WebMake/DataSources directory of the distribution).
                              
                             
                              Every data source needs a protocol, an alphanumeric lowercase identifier
                              to use at the start of the src attribute to indicate that a data source is
                              of that type.
                              
                             
                              Each implementation of this module should implement these methods:
                              
                             
                              
                                new ($parent)
                                
                                  instantiate the object, as usual.
                                  
                                
                                add ()
                                
                                  add all the items in that data source as content
                                   chunks. (See below!)
                                  
                                
                                get_location_url ($location)
                                
                                  get the location (in URL
                                   format) of a content chunk loaded by add().
                                get_location_contents ($location)
                                
                                  get the contents of the
                                   location. The location, again, is the string provided by add().
                                get_location_mod_time ($location)
                                
                                  get the current modification
                                   date of a location for dependency checking. The location, again, is
                                   in the format of the string provided by add(). 
                              Notes:
                              
                             
                              
                                
                                  If you want add()to read the content immediately, call$self->{parent}->add_text ($name, $text, $self->{src},
                                   $modtime).
                                
                                  add()can defer opening and reading content chunks straight away.
                                   If it calls$self->{parent}->add_location ($name, $location,
                                   $lastmod), providing a location string which starts with the data
                                   source's protocol identifier, the content will not be loaded until
                                   it is needed, at which pointget_location_contents()is called.
                                
                                  This location string should contain all the information needed to
                                   access that content chunk later, even if add()was not been
                                   called. Consider it as similar to a URL. This is required so thatget_location_mod_time()(see below) can work.
                                
                                  All implementations of add()should call$fixed =
                                   $self->{parent}->fixname ($name);to modify the name of each
                                   content chunk appropriately, followed by$self->{parent}->add_file_to_list ($fixed);to add the content
                                   chunk's name to the filelist content item.
                                
                                  Data sources that support the <media> tag need to implement
                                   get_location_url, otherwise an error message will be output.
                                
                                  Data sources that support the <contents> tag, and defer
                                   reading the content until it's required, need to implement
                                   get_location_contents, which is used to provide content from a
                                   location set using$self->{parent}->add_location().
                                
                                  Data sources that support the <contents> tag need to implement
                                   get_location_mod_time. This is used to support dependency
                                   checking, and should return the modification time (in UNIXtime_tformat) of that location. Note that since this is used
                                   to compare the modification time of a content chunk from the
                                   previous time webmake was run, and the current modification time,
                                   this is called before the real data source is opened. 
                              
                             
                          
                         |