HTML::TextToHTML Sample Conversion

This sample is based hugely on the original sample.txt produced
by Seth Golub for txt2html.

I used the following options to convert this document:

-tf --mail --heading '^ *--[\w\s]+-- *$' --system_link_dict TextToHTML.dict
--append_body sample.foot --infile sample.txt --outfile sample.html

This has either been done at the command line with:

	perl -MHTML::TextToHTML -e run_txt2html -- *options*

or from a (test) perl script with:
	
	use HTML::TextToHTML;
	my $conv = new HTML::TextToHTML();
	$conv->txt2html([*options*]);

======================================================================

From bozo@clown.wustl.edu
Return-Path: <bozo@clown.wustl.edu>
Message-Id: <9405102200.AA04736@clown.wustl.edu>
Content-Length: 1070
From: bozo@clown.wustl.edu (Bozo the Clown)
To: rubykat@katspace.com (Kathryn Andersen)
Subject: Re: HTML::TextToHTML
Date: Sun, 12 May 2002 10:01:10 -0500

Bozo wrote:
BtC> Can you post an example text file with its html'ed output?
BtC> That would provide a much better first glance at what it does
BtC> without having to look through and see what the perl code does.

Good idea.  I'll write something up.

       -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The header lines were kept separate because they looked like mail
headers and I have mailmode on.  The same thing applies to Bozo's
quoted text.  Mailmode doesn't screw things up very often, but since
most people are usually converting non-mail, it's off by default.

Paragraphs are handled ok.  In fact, this one is here just to
demonstrate that.

THIS LINE IS VERY IMPORTANT!
(Ok, it wasn't *that* important)


EXAMPLE HEADER
==============

Since this is the first header noticed (all caps, underlined with an
"="), it will be a level 1 header.  It gets an anchor named
"section-1".

Another example
===============
This is the second type of header (not all caps, underlined with "=").
It gets an anchor named "section-1.1".

Yet another example
===================

This header was in the same style, so it was assigned the same header
tag.  Note the anchor names in the HTML. (You probably can't see them
in your current document view.)  Its anchor is named "section-1.2". 
Get the picture?



                    -- This is a custom header --

You can define your own custom header patterns if you know what your
documents look like.



Features of HTML::TextToHTML
============================

 * Handles different kinds of lists
   1. Bulleted
   2. Numbered
      - You can nest them as far as you want.
      - It's pretty decent about figuring out which level of list it
        is supposed to be on.
        - You don't need to change bullet markers to start a new list.
   3. Lettered
      A. Finally handles lettered lists
      B. Upper and lower case both work
         a) Here's an example
         b) I've been meaning to add this for some time.
      C. HTML without CSS can't specify how ordered lists should be
         indicated, so it will be a numbered list in most browsers.
 * Doesn't screw up mail-ish things
 * Spots preformated text sometimes

                 It just needs to have enough whitespace in the line.
        Surrounding blank lines aren't necessary.  If it sees enough
        whitespace in a line, it preformats it.  How much is enough?
        Set it yourself at command line if you want.

 * You can append a file automatically to all converted files.  This
   is handy for adding signatures to your documents.

 * Deals with paragraphs decently.

   o looks for short lines in the middle of paragraphs and keeps them
     short with the use of breaks (<BR>).  How short the lines need to
     be is configurable.
   o Unhyphenates split words that are in the middle of para-
     graphs.  Let me know if trailing punctuation isn't handled "prop-
     erly".  It should be.

 * Puts anchors at all headers and, if you're using the mail header
   features, at the beginning of each mail message.  The anchor names
   for headings are based on guessed section numbers.  

 * Groks Mosaic-style "formatted text" headers (like the one below)

 * Can hyperlink things according to a dictionary file.
   The sample dictionary handles URLs like
   http://www.aigeek.com/ and also shows how to do simpler
   things such as linking the word txt2html the first time it appeared.
 * One can also use the link-dictionary to define custom tags, for
   example using the star character to indicate *italics*.

Example of short lines
----------------------

We're the knights of the round table
We dance whene'er we're able
We do routines and chorus scenes
With footwork impeccable.
We dine well here in Camelot
We eat ham and jam and spam a lot.

Example of varied formatting
----------------------------

If I want to *emphasize* something, then I'd use stars to wrap
around the words, *even if there were more than one*, *that's*
what I'd do.  But I could also _underline_ words, so long as
the darn thing was not a_variable_name, in which case I wouldn't
want to lose the underscores in something which thought it was
underlining.  Though we might want to _underline more than one word_
in a sentence.  Especially if it is _The Title Of A Book_.
For another kind of emphasis, let's go and ##put something in bold##.

THINGS TO DO
============

There are some things which this module doesn't handle yet which
I would like to implement. Something which would be *really
exciting* would be coping with italics and similar things *spread
across multiple lines*.

A. I would like to be able to preserve lettered lists, that is:
   a) recognise that they are letters and not numbers (which it already
      does)
   b) display the correct OL properties with CSS so as to preserve
      that information.
B. Related to the above, I would like to use correct XHTML, not 3.2 HTML.
  I am not sure if I would like it to be optional.

I would also like to implement tables, like the following:

Written For: The First Sentinel Lyric Wheel
Disclaimer:  The characters and concepts of The Sentinel are owned by Pet Fly
             Productions.  I'm just borrowing them for a while.
Rating:      PG (language)
Summary:     After the fire, guilt and nightmare.  Hurt, angst, a little
             comfort.

Definition lists would also be cool.

Fanfic:
  Fiction based on the universe of some movie or TV show.
Fanzine:
  Amateur magazine produced by fans.

----------------------------------------

The footer is everything from the end of this sentence to the
</BODY> tag.

