Road Map for Future Development of PDF::Builder              16 February 2021

In order to encourage others to contribute code and/or algorithms to the
effort, I am publishing this road map of where I would like the product to go.
Please, no copyrighted code or patented algorithms, unless the owner releases
them under an Open Source license! The content of this road map is open to
discussion, too, either on the GitHub bugs list (feature requests with the
"enhancement" label or "general discussion" label) or under the forum (Feature 
Requests or General Discussions). If you have a one-off suggestion, there is
a contact link on the forum so you don't have to sign up for either the forum
or GitHub to be heard.

If there are no contributions on something, I reserve the right to write my
own modules (dependent on PDF::Builder) and sell them.

I make no promises that any of the following items will be implemented; it
depends on how much free time I can come up with, and how many people chip in
to help with code and algorithms. I'll be happy to discuss coding specific
requirements for money/donations (but the result is still free software). 

The assignment to section I or II is somewhat arbitrary, and an item could move 
from one to the other. Some of these items are already listed in bug reports, 
or as feature requests. There is no particular order to these items (i.e.,
they are not ranked by priority).

=============================================================================
I. Items to add to the core product

These are things that should be in the base PDF::Builder product, as everyone
will need them (or, it would be cleaner to have it in the base rather than as
an add-on separate module).

A. ## DONE ## release 3.018
   Proper TTF/OTF support (RT 113700), especially ligature replacement and
   complex script alphabets using GSUB and GPOS information. I've been looking
   at Pango and directly using HarfBuzz, but both look to be a lot of work.

   I know of a developer tinkering with an add-on layer using Pango for both 
   PDF::API2 and Builder, but it's not clear to me how far he intends to go 
   beyond simple markup to change fonts (a la HTML presentation tags). If it
   uses Pango, hopefully the other stuff will come along for free. We'll see.
   For Western (simple) scripts, automatic ligature support would be wonderful,
   but we need to be able to suppress selected ligatures (e.g., 'ff' in the
   English word 'shelfful'). Support for swashes and alternate glyph choices 
   would be very nice to have (embedded markup language?). For complex scripts 
   like Arabic family and southern/southeastern Asian families, proper support 
   (Pango?) is vital.

   UPDATE: See Text::Layout and HarfBuzz::Shaper packages. Layout is usable
     with Builder (but no explicit support yet). Shaper is supported by Builder
     for ligatures and complex scripts.

B. Unification of font support: including character set and encoding support
   improvements [see CTS 16 and CTS 23] to make more commonality between using 
   UTF-8 and single byte encodings, across all the font types (core, TrueType, 
   Type1/PS, etc.). One problem with core fonts is, even though most core fonts 
   are already TrueType, that only the Latin-1 glyph set has widths defined, 
   and only single byte encodings are possible (similar for Type1/PS fonts). To 
   support UTF-8 for core and PS, the font might have to be built on the fly 
   for a page (like a synthetic font), with translations to single bytes for 
   all glyphs. If the resulting font exceeds 256 characters, something would 
   have to be done to split the page internally into two or more sections, 
   each with their own embedded virtual font. Glyph widths would have to be 
   available for all characters.

C. Improved documentation, possibly even a book giving detailed explanations
   and examples, as both a reference and a tutorial. Needless to say, there
   would have to be sufficient interest to warrant the time and expense of
   writing/editing and publishing (in any format) a book to be sold!

D. PDF/A (archival document management, RT 120375): this might be more than
   throwing a few flags/overriding flags to force font embedding and no 
   encryption/ passwords. There may be other stuff that needs to be done to 
   achieve recognition as a proper archival format (and there are apparently 
   several archival formats).

E. JPEG2000 image file support (CTS 12): I don't know if this is worth it, as
   there seems to be very little use of this, but if someone is interested,
   have at it...  any other newish image formats that PDF can support?

F. Fix Bar Code generation (CTS 1): there seems to be something quite wrong
   with the current bar code generation, so it's possible that no one is using
   it in real documents yet. It's also possible that I'm not writing my test
   cases properly -- does anyone know if they work? I suspect that the use of 
   XForms (relocatable text and graphics) for the bar image is not scaling 
   nicely, and may have to be replaced by drawn graphics primitives (text and 
   graphics drawn in their final place). Many other 1D and 2D bar codes 
   (including QR) would be good, but perhaps the bar codes should go into a 
   separate module, due to their potentially large code size and use of "new"
   Perl modules. Even the existing 5 or so formats could be moved out, as 
   presumably no one is using them yet (if they are, in fact, broken). This is 
   in section I, as bar codes are already implemented in the base, but it's 
   possible that bar codes could be removed and reimplemented in section II as 
   a separate library or module.

G. Fix Small Caps (and capitalization in general) for ligatures (CTS 13):
   some ligatures given in Unicode or single byte encodings don't get properly
   uppercased. The probable solution would be to decompose ligatures to their
   individual letters before capitalization or Small/Petite Caps (if an 
   uppercase version doesn't exist in the font, or use GSUB processing to 
   recreate a ligature from the capital letters). As Perl doesn't seem to handle
   capitalizing ligatures properly, a "capitals" function would need to be 
   offered, as well as improvements to the Small Caps in "synfonts". Various
   non-Latin single characters (e.g., Greek terminal/nonterminal sigma, German
   eszett, long s) also may need proper handling for capitalization.

   UPDATE: It may be better to use individual letters (rather than ligature
     Unicode points), allowing easy capitalization and small caps. Then use
     HarfBuzz::Shaper to replace lowercase letter sequences with true ligatures
     on the fly.

H. Fallback glyphs (CTS 5) when a desired glyph is not found in one font,
   but can be found in another. This is similar to HTML when you give a font
   family list in CSS. Pango might help with this.

   UPDATE: This is being considered for Text::Layout, but nothing scheduled yet.

I. Support for tagged structure (CTS 17 and RT 120375). At least, don't corrupt
   an existing tagged PDF file when extracting pages.

J. Adding comment fields to any object (and possibly standalone comments as
   their own objects). An example would be an image object with a comment
   giving the source image file, for debugging purposes).

K. Text method to move to arbitrary points: relative or absolute movement
   horizontally and vertically (a range of units), including tab support 
   (including \t and \v embedded in text), and maybe \n while we're at it.

   Note that tabs bring up some issues. First, a tab by character count (the
   traditional way, e.g., to the next n8-th column) is useful only for 
   monospaced fonts, and no changes in font size in the line. Thus, tab stops 
   would be more useful when defined by some absolute dimension (e.g., inches 
   or mms) of column position. Second, tabbing is usually done to get text
   columns (sub columns), which involves a lot of manual setup and twiddling of 
   text. Consider using a TABLE within the column or page to get text organized 
   into the desired format (see "tbl" addition in section II).

   UPDATE: The $text->distance() call permits arbitrary delta-x and delta-y
     (in points) of text movement. It's not hard to convert non-points to 
     points, such as distance(2.4/in, -1.23/cm). For tables, take a look at
     the PDF::Table package for now.

L. Determine what it is about "CJK" fonts (.ttf and .otf) that makes them
   incompatible with synfont [RT 130040] and embedding [RT 130041], and fix if 
   possible. Are separate CJK fonts even necessary these days? Also note that
   many CJK fonts refuse to "subset" when embedded (the entire font gets
   embedded, even if you only use a handful of glyphs!).

M. Add decorative rectangular box effects around sections of text. With or
   without border (allow rounded corners) and background color, drop shadows
   (3D effect), etc. The box is drawn at given dimensions and location, and
   the text written over it in the usual manner. Content clipping might also
   be supported. See PDF::Table for drawing rules (and borders) for ideas, as
   well as block background colors (gfx_bg object before text object).

N. Extend HarfBuzz::Shaper use (see A.) to flow paragraphs and sections (fill)
   to match capability of existing text-fill calls. Architect so as to extend
   easily to full paragraph shaping and "pouring" text into arbitrary columns,
   with balancing. Justification to avoid ragged-left or -right needs to be
   handled carefully for connected glyphs (e.g., Arabic, Indic, cursive Latin).

   Treat HarfBuzz::Shaper handled-fonts just like any other font when it comes
   to various text-handling routines (including length, justifying and aligning
   text, filling lines, paragraph, section, textlabel, etc.).

O. ## DONE ## release 3.020
   Support for PNM and related graphics images. See RT 132844.

P. Look into using TeX::Hyphen or Text::Hyphen to split words properly. The
   latter supports 3 languages besides English. Both use TeX (Knuth-Liang)
   algorithms. Eventually need means to override built-in hyphenation (e.g.,
   force 're-cord' or 'rec-ord', per context), similar to ligature control. 
   One way would be to insert soft-hypen &SHY;, this might require all 
   syllable breaks to be so marked, since TH might not see it any more.
   Use any place where a string is flowed into multiple lines, and eventually 
   for complete paragraph shaping. Retain camelCase and punctuation/numeric 
   splitting as fallback for non-words. Consider own built-in version of either
   TH, using CTAN directly (would take rewrite of the Perl code?), if can't
   get owner to upgrade it.

Q. ## DONE ## release 3.020
   Look at Basic/PDF/File.pm 'Q>' unpack (ref RT 133131). Supposedly will not
   run on a 32 bit platform -- will it work on such platform to check if high
   33 bits are all 0, and convert the low 32 bits to a fullword integer?

R. TTF/OTF font embedding: consider -forceASCII and -forceLatin1 options to
   make the entire ASCII alphabet (or the entire Latin-1 alphabet) embedded,
   not just what subset of characters (and glyphs) were used in text. This 
   might be useful in fillable forms and any other situation where an end user 
   gets to type in text. There is already the ability to embed the entire 
   font, but that's often overkill.

=============================================================================
II. Items to add to a separate area (new module or sub-module)

These are things that not everyone will require, and so should be split out 
into possibly a separate module (dependent on PDF::Builder). Some of these
things are getting into the realm of support for markup languages and word
processing.

A. Hyphenation and paragraph shaping: including CTS 20 (Hyphenation) and CTS 24 
   (pseudo page objects). The idea is to use Knuth et al.'s line-splitting and 
   paragraph shaping algorithms to flow text into a space in a visually 
   pleasing manner, while obeying widows and orphans constraints (as well as
   not orphaning headings). Pango may help here with line and word splitting.
   ** Look into item P in section I **
   Text::KnuthPlass might be usable for general paragraph setup (allows non-
   rectangular paragaph), uses Text::Hyphen internally, don't know yet if it 
   does the full Knuth-Plass "rivers of white" and "too many hyphens in a row" 
   bit. Interaction with Text::Layout stuff (font, size, changes)?

   I've gotten Text::Hyphen and Text::KnuthPlass working, and tests suggest 
   that they (mostly) work great. Unfortunately, KP doesn't build on all 
   systems (a simple fix), so I may have to forcibly take it over if the owner 
   doesn't respond reasonably soon, or just take it into the product. Hyphen 
   could use some improvements for multilanguage support, also waiting for a 
   response. Some sort of "virtual printing (see 'B') would be necessary, as 
   dealing with widows and orphans would be much easier that way.

B. Virtual pages: this would be related to item (A) (paragraph shaping), where
   PDF code would not be immediately written to an output page, but would be
   buffered, and output only later. This permits easier paragraph shaping and
   other rearrangements across columns and pages, where the starting location
   of a line of text is in the buffer, and it can be updated when moving the
   line around. Even individual words might be tagged (location and hyphenation
   points) so that lines could be broken at will. Even a limited amount of
   virtuality (virtual line output) could be useful for resetting a baseline
   to accommodate a change in font size -- this might involve tagging a word or
   block of words of the same height.

   PDF::Table product (or table functionality within Builder) could make use
   of virtual printing to "print" to a mini-page within a cell (fixed width,
   min/max height). If it doesn't fit on the page, decide where to split row
   to avoid widows and orphans. Also knowing cell height and row height, can
   vertically align a cell's contents.

C. General text flowing capability, to fill irregularly shaped columns (such
   as with intruding inserts or margin notes) in a balanced manner, including
   spanning headings across all columns, where appropriate. This would also
   include flowing text around images, tables, or other inserts to avoid
   leaving large empty sections of pages (e.g., have a large table that floats
   to the next page, with text after it that could easily come before it on
   the original page). Something to handle cross references would be handy 
   here, to output "see table X above" or "below", "on the previous page"/"on 
   the next page", "on page X", etc. in a prescribed and consistent manner.
   Note that it might be good to notify the user during processing that such a 
   move has been done, so that it can be inspected.

   UPDATE: Columns could be any shape, drawn with lines, polylines, arcs,
     circles, splines, etc. Text baselines don't necessarily have to be 
     horizontal. Clip baselines to the "column" shape to get the line length
     and starting point for each line of text. There may have to be some 
     iterations to reshape a line if its height results in a shift of the
     baseline into a wider or narrower area.

D. Font Families: per CTS 22, make it easier to deal with switching fonts
   and variations within a font (bold, italic, size, color, underline, small
   caps, etc.), possibly with HTML tags inlined. The idea would be to only have 
   to specify a typeface and initial size, and then switch in and out of 
   variants (bold, italic, etc.) without having to call all the font routines 
   yourself. Perhaps several formats of markup could be supported (HTML, 
   Markdown, troff), driven by a definition file? Pango may help with this, at
   least with font-specification markup.

   UPDATE: Text::Layout package may prove very useful here, although it needs
     some enhancements (a new "back end" to return text for shaping, rather 
     than directly outputting it).

E. Continuing (D), eventually much of HTML and other common markups (headings, 
   quotes, HTML entities, tables, lists, etc.) supported. One goal would be to
   eventually support enough of each markup to have a separate converter 
   product (HTMLtoPDF, troffToPDF, etc.), but support for full Javascript and
   CSS (for HTML pages) will be a bear! Some level of macros (predefined
   strings) would be useful. Non-HTML might be converted to HTML or Pango
   format first.

F. Support for SVG graphics (drawing), support for troff's eqn, pic, and tbl
   markup languages to make it easier to do anything other than plain text.
   LaTeX equation and table handling would be good to have, too, to avoid 
   having to rewrite marked up text. Also provide a full graphing functions 
   library (stacked/unstacked line, bar, scatter plots etc. in 2D and 3D).

G. Prepress production markup: convenience functions to place a watermark or
   draft notice on all (or selected) pages, crop marks (based on trimbox),
   temporarily draw page bounding boxes, temporarily draw object limit boxes,
   color dots/bars for color printing alignment, instructions to the (human)
   printer.

   Page background color or pattern should extend to the full size of the page
   and not end when content ends part way down the page. Remind users that 
   most printers will not print all the way to the edge. See Boxes.pl example.

H. Incorporate PDF::Table into PDF::Builder::Table. Simplify it somewhat (e.g.,
   instead of separate line-width and color settings, use a list: w (width in
   points with default color and solid line), or [w, color, optional-dash-
   pattern]. Use it for borders and rules, and possibly frames. A "frame" would
   be the enclosure for the table, and would be either a line spec or a width
   and pattern (3D raised, 3D sunken, sunken table, raised table, floating
   table with shadow, etc.). A "rule" would be horizontal and vertical
   divider lines, and a "border" would be cell dividers ([w, color, margin-to-
   cell edge, optional-dash-pattern]). Other simplifications and consolidations
   of settings as justified (do not have to maintain absolute compatibility
   with existing PDF::Table). Tables continued to the next page would not get
   a full frame at the bottom/top, but a heavy dashed line (if breaking in
   middle of a cell) or heavy solid line (if breaking at a row boundary). It
   might be good not to automatically create a next page and start outputting
   the rest of the table, but hold the contents and alert the programmer that
   at least one more call is needed to finish.

   Currently, PDF::Table basically equalizes column widths as much as it can,
   but consider a starting point of relative and/or absolute column widths,
   like many other table implementations. Some columns absolute widths, and
   remaining space divided up even among *'s (with a column N*), subject to
   minimum and maximum widths.

   Within a cell, ideally it would be treated as a mini-page, with all the 
   normal PDF construction capabilities including paragraph shaping, flow into
   column(s), images, etc. This would be better than the "text_block" used in
   PDF::Table (more uniform coding and treatment), although some ideas from
   text_block might find a home. However, such complex treatment (a table 
   could embed a table) requires virtual pages to permit a lot of rearrangement.

I. Consider Optional Content Groups (Layers), per 32000-2008 section 8.11.
   This permits drawings to be shown by layer, or a watermark/copyright layer
   to show only on printing.

J. ## DONE ## release 3.020  (documentation improved)
   Per wkHTMLtoPDF issue 4846, at least some phone cameras output "portrait"
   mode photos in landscape mode (rotated), with an "orientation" tag. JPEG
   (at least) image handling may need a rotation flag in the call, and/or 
   pay attention to the Exif orientation flag. Confirmed that
   Builder JPEG support does NOT respect the orientation flag.
   https://www.google.com/search?client=firefox-b-1-d&q=jpeg+orientation+metadata
   Unfortunately, there are a number of ways to specify the orientation flag
   including XML '<exif:Orientation>Top-left</exif::Orientation>' and buried
   somewhere in the Exif or JFIF header of the file. It might be best to ask
   the image what its orientation is, and leave translation and rotation of
   the placed image to user code, rather than trying to flip the contents of
   the image file directly. See writeup in Docs.pm.

K. A means to update PDF content already written to the $pdf data structure, 
   but not yet written to file. This could involve moving a text write (one
   or more translates) to somewhere else on the page, or even moving from one
   page to another. More general editing of material already "written" out,
   such as changing a font size or a color -- would need a way to find the
   desired content to be modified. Ability to erase content or move it from
   page to page (e.g., started a table and then found it wouldn't fit on one
   page -- move it to the top of the next page to allow display on a single
   page). If virtual pages, design with all this in mind, so easy to update
   content before being "written" out. If nothing is written to the PDF file
   until the end (save or saveas), this might even supersede virtual pages.

L. (See B and H items above) Consider a $page->subpage(x, y, w, h, opts) to
   subclass a page into a restricted area (not necessarily rectangular, or
   bounded in height). Things like columns and table cells could then be
   treated as normal pages (inherit text and graphics contexts?) and anything
   could be put into them. May want to wait until have virtual write 
   capability rather than writing to file right away, for fitting table cells
   into a page, and balancing columns. Once in, build-in PDF::Table capability
   as PDF::Builder::Table, and can have arbitrary content in a table cell.

=============================================================================
