NAME
    HTML::HTML5::Sanity - Perl extension to make HTML5 DOM trees less insane.

SYNOPSIS
      use HTML::HTML5::Parser;
      use HTML::HTML5::Sanity;
  
      my $parser    = HTML::HTML5::Parser->new;
      my $html5_dom = $parser->parse_file('http://example.com/');
      my $sane_dom  = fix_document($html5_dom);
  
      print document_to_clarkml($sane_dom);

DESCRIPTION
    The Document Object Model (DOM) generated by HTML::HTML5::Parser meets the
    requirements of the HTML5 spec, but will probably catch a lot of people by
    surprise.

    The main oddity is that elements and attributes which appear to be
    namespaced are not really. For example, the following element:

      <div xml:lang="fr">...</div>

    Looks like it should be parsed so that it has an attribute "lang" in the
    XML namespace. Not so. It will really be parsed as having the attribute
    "xml:lang" in the null namespace.

    "fix_document"
              $sane_dom = fix_document($html5_dom);

            Returns a modified copy of the DOM and leaving the original DOM
            unmodified.

    "document_to_clarkml", "element_to_clarkml", "attribute_to_clarkml",
              $string = document_to_clarkml($document);
              $string = element_to_clarkml($element);
              $string = attribute_to_clarkml($attribute);

            Returns a Clark-Notation-like string useful for debugging. Only
            the first function, which takes an XML::LibXML::Document is
            exported by default, but by choosing an export list of ":all" or
            ":debug" will export the others too.

    "document_to_hashref", "element_to_hashref", "attribute_to_hashref",
              $data = document_to_hashref($document);
              $data = element_to_hashref($element);
              $data = attribute_to_hashref($attribute);

            Returns a hashref useful for debugging. Only the first function,
            which takes an XML::LibXML::Document is exported by default, but
            by choosing an export list of ":all" or ":debug" will export the
            others too.

BUGS
    Please report any bugs to <http://rt.cpan.org/>.

SEE ALSO
    HTML::HTML5::Parser, XML::LibXML.

AUTHOR
    Toby Inkster <tobyink@cpan.org>.

COPYRIGHT AND LICENSE
    Copyright (C) 2009 by Toby Inkster

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself, either Perl version 5.8 or, at your
    option, any later version of Perl 5 you may have available.

