Road Map for Future Development of PDF::Builder                 16 August 2019

In order to encourage others to contribute code and/or algorithms to the
effort, I am publishing this road map of where I would like the product to go.
Please, no copyrighted code or patented algorithms, unless the owner releases
them under an Open Source license! The content of this road map is open to
discussion, too, either on the GitHub bugs list (feature requests with the
"enhancement" label or "general discussion" label) or under the forum (Feature 
Requests or General Discussions). If you have a one-off suggestion, there is
a contact link on the forum so you don't have to sign up for either the forum
or GitHub to be heard.

If there are no contributions on something, I reserve the right to write my
own modules (dependent on PDF::Builder) and sell them.

I make no promises that any of the following items will be implemented; it
depends on how much free time I can come up with, and how many people chip in
to help with code and algorithms. I'll be happy to discuss coding specific
requirements for money/donations (but the result is still free software). 

The assignment to section I or II is somewhat arbitrary, and an item could move 
from one to the other. Some of these items are already listed in bug reports, 
or as feature requests. There is no particular order to these items (i.e.,
they are not ranked by priority).

=============================================================================
I. Items to add to the core product

These are things that should be in the base PDF::Builder product, as everyone
will need them (or, it would be cleaner to have it in the base rather than as
an add-on separate module).

A. Proper TTF/OTF support (RT 113700), especially ligature replacement and
   complex script alphabets using GSUB and GPOS information. I've been looking
   at Pango and directly using HarfBuzz, but both look to be a lot of work.

B. Unification of font support: including character set and encoding support
   improvements [see CTS 16 and CTS 23] to make more commonality between using 
   UTF-8 and single byte encodings, across all the font types (core, TrueType, 
   Type 1, etc.). One problem with core fonts is that only the Latin-1 glyph
   set has widths defined, and only single byte encodings are possible (similar
   for Type 1/PS fonts). To support UTF-8 for core and PS, the font might have
   to be built on the fly for a page (like a synthetic font), with translations
   to single bytes for all glyphs. If the resulting font exceeds 256 
   characters, something would have to be done to split the page into two or
   more sections, each with their own embedded virtual font.

C. Improved documentation, possibly even a book giving detailed explanations
   and examples, as both a reference and a tutorial. Needless to say, there
   would have to be sufficient interest to warrant the time and expense of
   writing/editing and publishing (in any format) a book to be sold!

D. PDF/A (archival document management, RT 120375): this might be more than
   throwing a few flags/overriding flags to force font embedding and no 
   encryption/ passwords. There may be other stuff that needs to be done to 
   achieve recognition as a proper archival format (and there are apparently 
   several archival formats).

E. JPEG2000 image file support (CTS 12): I don't know if this is worth it, as
   there seems to be very little use of this, but if someone is interested...

F. Fix Bar Code generation (CTS 1): there seems to be something quite wrong
   with the current bar code generation, so it's possible that no one is using
   it in real documents yet. I suspect that the use of XForms for the bar
   image is not scaling nicely, and may have to be replaced by drawn graphics
   primitives. Many other 1D and 2D bar codes (including QR) would be good,
   but perhaps the bar codes should go into a separate module, due to their
   potentially large code size. Even the existing 5 or so formats could be 
   moved out, as presumably no one is using them yet.

G. Fix Small Caps (and capitalization in general) for ligatures (CTS 13):
   some ligatures given in Unicode or single byte encodings don't get properly
   uppercased. The probable solution would be to decompose ligatures to their
   individual letters before capitalization or Small Caps (if an uppercase
   version doesn't exist in the font). As Perl doesn't seem to handle 
   capitalizing ligatures properly, a "capitals" function would need to be 
   offered, as well as improvements to the Small Caps in "synfonts".

H. Fallback glyphs (CTS 5) when a desired glyph is not found in one font,
   but can be found in another. This is similar to HTML when you give a font
   family list in CSS.

I. Support for tagged structure (CTS 17 and RT 120375). At least, don't corrupt
   an existing tagged PDF file when extracting pages.

J. Adding comment fields to any object (and possibly standalone comments as
   their own objects). An example would be an image object with a comment
   giving the source image file, for debugging purposes).

K. Text method to move to arbitrary points: relative or absolute movement
   horizontally and vertically (a range of units), including tab support 
   (including \t and \v embedded in text), and maybe \n while we're at it.

=============================================================================
II. Items to add to a separate area

These are things that not everyone will require, and so should be split out 
into possibly a separate module (dependent on PDF::Builder). Some of these
things are getting into the realm of support for markup languages and word
processing.

A. Hyphenation and paragraph shaping: including CTS 20 (Hyphenation) and CTS 24 
   (pseudo page objects). The idea is to use Knuth et al.'s line-splitting and 
   paragraph shaping algorithms to flow text into a space in a visually 
   pleasing manner, while obeying widows and orphans constraints (as well as
   not orphaning headings).

B. Virtual pages: this would be related to item (A) (paragraph shaping), where
   PDF code would not be immediately written to an output page, but would be
   buffered, and output only later. This permits easier paragraph shaping and
   other rearrangements across columns and pages, where the starting location
   of a line of text is in the buffer, and it can be updated when moving the
   line around. Even individual words might be tagged (location and hyphenation
   points) so that lines could be broken at will.

C. General text flowing capability, to fill irregularly shaped columns (such
   as with intruding inserts or margin notes) in a balanced manner. This would
   also include flowing text around images, tables, or other inserts to avoid
   leaving large empty sections of pages (e.g., have a large table that floats
   to the next page, with text after it that could easily come before it on
   the original page). Something to handle cross references would be handy 
   here, to change "see table X above" to "below", "on the previous page"/"on 
   the next page", "on page X", etc. in a prescribed and consistent manner.

D. Font Families: per CTS 22, make it easier to deal with switching fonts
   and variations within a font (bold, italic, size, color, underline, small
   caps, etc.), possibly with HTML tags inlined. The idea would be to only have 
   to specify a typeface and initial size, and then switch in and out of 
   variants (bold, italic, etc.) without having to call all the font routines 
   yourself. Perhaps several formats of markup could be supported (HTML, 
   Markdown, troff), driven by a definition file?

E. Continuing (D), eventually much of HTML and other common markups (headings, 
   quotes, entities, tables, lists, etc.) supported. One goal would be to
   eventually support enough of each markup to have a separate converter 
   product (HTMLtoPDF, troffToPDF, etc.), but support for full Javascript and
   CSS (for HTML pages) will be a bear! Some level of macros (predefined
   strings) would be useful.

F. Support for SVG graphics (drawing), support for troff's eqn, pic, and tbl
   markup languages to make it easier to do anything other than plain text.
   Also a full graphing functions library (line, bar, scatter plots etc. in
   2D and 3D).

G. Prepress production markup: convenience functions to place a watermark or
   draft notice on all (or selected) pages, crop marks (based on trimbox),
   temporarily draw page bounding boxes, temporarily draw object limit boxes,
   color dots/bars for color printing alignment, instructions to the printer.

=============================================================================
