# Generated by getTODO.pm on Fri Aug 11 00:52:11 2006
# for Genezzo Version 0.63 - Alpha 20060811

Some general TODO categories:

APIs:
  See if can embed genezzo in apache.
  real DBI support (DBI::Genezzo)
  web-based management console

  Need to fix quoting to behave consistently 
  in strings, literals, and export functions. (Need test)

Missing SQL features:
  Binds, sorting/aggregation, subquery support, views, explain plan

Multiuser support issues:
  transactions, logging, recovery, 
  shared memory buffer cache
  exclusive table locks first, then read share/write exclusive,
  then row locks

  users/roles, sessions, schemas, tablespaces, authentication

query optimization:
  Rule/Cost-based optimization
  costing by index probe

fancier functionality:
  btrees with overflow blocks for long keys
  block migration
  block-level predicate pushdown, aggregate pushdown

  yaml datatype support

  antlr parser

  file encryption, row compression
  MLS: multi level security

  parallel/distributed operation, replication, 
  scalability, fault-tolerance

  user-defined functions, indexes, datatypes

  non-blocking aggregation based upon count estimation

  unicode support

  error messages


space management:
  lehman and yao "efficient locking for concurrent
  operations on b-trees" ACM TODS v6, #4, Dec 81, pp 650-670.

  SCN/LSN block header information

  freelists, extent headers


Per-file TODO breakdown follows:

TODO lib/Genezzo/Block/RDBlkA.pm
    HSplice: offset calculation must match offset2hkey in RDBlock. Special
    handling needed if inherited by RDBlk_NN?

TODO lib/Genezzo/Block/RDBlk_NN.pm
    build simple test cases
    build complex test cases
    test thoroughly
    packdeleted: make this work. It's broken!
    integration with bt2 - need to packdelete in bsplit, do null checks in
    leaf blocks (branch blocks should be ok)
    need a validation function to ensure that block maintains invariant:
    small number of leading metadata rows starting at row zero, followed by
    data rows (deletes ok). Easier to support non-split rows initially, but
    should be able to support head rows (need mods to splice functions to
    preserve rowstats for this case).
    need to modify metadata methods so all metadata created in first n rows.
    could simply have delete really delete the rows, so no changes necessary
    for rdblock clients (i.e., no "null rows" generated).

TODO lib/Genezzo/Block/RDBlock.pm
    use row directory rowlen vs len/value for row storage
    meta row - should binary search for meta id
    unicode support

TODO lib/Genezzo/Block/Std.pm
    Support for completely variable block headers

TODO lib/Genezzo/Block/Util.pm
    Support for completely variable block headers

TODO lib/Genezzo/BufCa/BCFile.pm
    note that _fileread could just be part of GetContrib
    need to move TSExtendFile functionality here if want to overload
    syswrite with encryption
    read_only database support
    buffer cache block zero should contain description of buffer cache
    layout
    need a way to free blocks associated with a file that is not currently
    in use

TODO lib/Genezzo/BufCa/BufCaElt.pm
    Deprecate GetInfo, convert to GetContrib.
    Switch syshook methods to use _BCE_dirtyhook
    get fileno, blockno info
    deal with multiple pins on same block sanely. We shouldn't be
    maintaining a reference count scheme here. Shouldn't pin be <= 1, and
    the destroy cb should set it to zero when last reference is garbage
    collected?

TODO lib/Genezzo/BufCa/DirtyScalar.pm
    Deprecate SetBCE: can shift responsibility and functionality to storeCB
    which will contain a hook, versus directly overloading STORE here.

TODO lib/Genezzo/Dict.pm
    pref1 - distinguish fixed/mutable parameters
    cons1 - distinguish user constraint names from system-defined names
    IDXTAB indexed tables don't give a constraint error, or primary key
    error. They don't have constraints because they are themselves indexes.
    Need to give better error message.
    Fix t/Cons1 constraint error
    DictTableAllTab: need index on allfileused for delete
    DictTableAllTab: update tsfiles for usefile
    need some combo _get_table/corecolnum/getcol - create a custom iterator
    that returns specified cols
    non-unique index support using bt2 use_keycount. Need to separate notion
    of SQL uniqueness from btree concept of unique, since a non-unique SQL
    index is a unique btree with the rid as least-significant key col (vs
    rid as value col).
    need drop table/drop index linkage, delete constraints for table, etc
    constraints: can fix check constraint in update case -- don't need to
    check insert if check columns aren't modified.
    constraints: need not null/foreign key constraints
    constraints: need to limit one primary key per table, prevent creation
    of duplicate indexes on same ordered key columns
    expose drop index, drop constraint. tie drop index/drop table?
    check usage of HCount for max tid, max fileidx, max consid. This won't
    work if have deletions
    DictTableUseFile: update space management to use this function correctly
    DictDefineCoreTabs, tsfiles: need to save file headersize as a tsfile
    column.
    deal with dict->{headersize} attribute in some rational way. Currently
    set via tablespace->TSAddFile...

TODO lib/Genezzo/GenDBI.pm
    Feeble/SQL: fix DESCribe to handle quoted identifiers.
    TABLESPACE: alter, drop, online, offline, more testing...
    This module is a bit of a catch-all, since it contains a DBI-style
    interface, an interactive loop with an interpreter and some presentation
    code, plus some expression evaluation and query planning logic. It needs
    to get split up.
    SQLselprep_Algebra: move to XEval
    SQLAlter: need And purity check
    SQLUpdate: cleanup - avoid generating new SELECT. Allow regexp update.
    SQLCreate: need to handle CREATE TABLE AS SELECT, table/column
    constraints, etc.

TODO lib/Genezzo/Havok.pm
    extension to support CPAN install via HavokUse
    use real YAML vs "fake" yaml documents
    Create dictionary initialization havok (vs post-startup havok)
    Need some type of first-time registration function. For example, if your
    extension module needs to install new dictionary tables. Probably can
    add arg to havokinit, and add a flag to havok table to track init
    status.
    Safety/Security: could load modules using Safe package to restrict their
    access (not a perfect solution). May also want to construct a dictionary
    wrapper to restrict dictionary capabilities for certain clients, e.g.
    let a package read, but not update, certain dictionary tables.
    Force Init/ReInit when new package is loaded.
    update module flags if necessary, handle cleanup
    use something like Sub::Install, Sub::Installer, or Hook::WrapSub to
    redefine the subroutines in SysHook, etc.

TODO lib/Genezzo/Havok/SysHook.pm
    should be able to dynamically create hook vars, versus using existing
    "our" vars.
    should we do something smart on dictionary shutdown, like unload hooks?
    Or have a clever way to re-init and reload a hook?

TODO lib/Genezzo/Havok/UserExtend.pm
    Need to fix "import" mechanism so can load specific functions into
    Genezzo::GenDBI namespace, versus creating stub functions. Use "import"
    and "export_to_level".
    Could just load Acme::Everything and we'd be done...
    Need function "type" information so can validate argument lists,
    determine return type of function

TODO lib/Genezzo/Havok/UserFunctions.pm
    use "sqlname" and "typecheck" attributes in user_functions table
    Need to fix "import" mechanism so can load specific functions into
    Genezzo::GenDBI namespace, versus creating stub functions. Use "import"
    and "export_to_level".
    Could just load Acme::Everything and we'd be done...
    Need function "type" information so can validate argument lists,
    determine return type of function. If pass named args, have "TypeCheck"
    and "Execute" modes for sql_function. Or have typecheck function pass
    back name/ref to execute function, since it may change depending on
    argument types.

TODO lib/Genezzo/Index/bt2.pm
    hkey/offset functions: should be able to convert between different
    "place" formats (Array and Hash prefixes), like the common fetch
    routine, or ASSERT that prefix matches.
    add reverse scan to search/SQLFetch
    support multicol keys, non-unique keys (via combo of key + rid as
    unique)
    support transaction unique constraints -- probably via treat key+rid as
    unique, then turn on true unique key, and scan for duplicates?
    find out why can't do pctfree=0
    Work on RDBlk_NN support.
    search with startkey/stopkey support, vs supplying compare/equal
    methods. restricting the search api to straight "=","<" comparisons
    means can try the estimation function
    need to handle partial startkey/stopkey comparison in searchR/SQLFetch
    for multi-col keys
    semantics of nulls in multi-col keys -- sort low?
    simplify _pack_row with splice and a supplied split position, something
    like -1 for normal indexes (n-1 key cols, 1 val col, so pop the val) or
    "N=?" for index-organized tables (N key cols, M val cols, so splice N)
    reorganize along the lines of "GiST" Generalized Search Trees (Paul
    Aoki, J. Hellerstein, UCB)
    ecount support?

TODO lib/Genezzo/Index/bt3.pm
    new: maybe a way to get blocksize from rstab/rsfile and pass to bt2,
    versus passing it to each layer separately
    getMainMeta from first block of tied hash, but no guarantee that space
    management is nice enough to return blocks in allocation order. Should
    store block address of leftmost leaf in index table.
    spacecheck: space cache should simply be free extents allocated to the
    index. Need to extend smfile to have multiple free extents in spacelist,
    vs just used extents. Note still an issue for simultaneous inserts --
    need lots of space for pathological case where each parallel insert
    splits a separate subtree. That's why transactions were invented.

TODO lib/Genezzo/Index/btHash.pm
    figure out whether should be a pushhash, hash, or rowsource
    SQLPrepare/Execute/Fetch: clean up. Shouldn't need to manage a
    distinction between using btHash as a row source and the old bt2 api.
    bt2 is wrong - should only have one Fetch style. Should be able to use
    the index start/stop key vs filtering.
    NEXTKEY: broken in "dump tsidx" for case where create 2 tables, insert
    some rows, then drop the first table (and don't COMMIT) and call dump
    tsidx. Loops in NEXTKEY - never terminates for allfileused index.
    Add ReadOnly mode so can view indexes, but not insert/update/delete.

TODO lib/Genezzo/Parse/SQL.pm
    alter table (elcaro MODIFY column NOT NULL) vs (sql3 ALTER COLUMN)...
    Support for DDL, ANSI Interval, Date, Timestamp, etc.
    fix the extra array deref in join rules
    error messages everywhere
    ECOUNT reserved word issues
    TRIM, UPPER, etc in standard function list?
    use of negative lookahead in reserved_word regex?
    table constraint, storage clause
    constraint attributes - deferrable, disable
    delete cascade referential action
    maybe can collapse qualified join with qj_leftop?
    table expr optional column list
    "system" literals like USER, SYSDATE
    better separation of strings and numbers (see concatenate)
    leading NOT
    double colon in function names?
TODO lib/Genezzo/Plan.pm
    update pod

TODO lib/Genezzo/Plan/MakeAlgebra.pm
    need additional work for non-query operations/special cases

TODO lib/Genezzo/Plan/QueryRewrite.pm
    check for function existance in GenDBI and main namespaces
    update pod
    need to handle FROM clause subqueries -- some tricky column type issues.
    check bool_op - AND purity if no OR's.
    check relational operator (comp_op, relop)
    handle ddl/dml (create, insert, delete etc with embedded queries) by
    checking for query_block info -- look for hash with 'query_block' before
    attempting table/col resolution. Need special type checking for these
    functions.
    refactor to common TreeWalker
    _process_name_pieces: quoted string/case-insensitivity
    handle all pseudo cols
    most value expression stuff needs to migrate to XEval

TODO lib/Genezzo/Plan/TypeCheck.pm
    need to generate stages to perform aggregate initialization and
    intermediate aggregation
    check for aggregates in WHERE clause
    check for GROUPing/aggregates
    check for final select list columns vs all projected columns in all
    clauses
    check args for all functions
    check for function existance in GenDBI and main namespaces
    update pod
    need to handle FROM clause subqueries -- some tricky column type issues.
    check for duplicate aliases/type mismatch in _FROM_subq_star_fixup ?
    check bool_op - AND purity if no OR's.
    check relational operator (comp_op, relop)
    handle ddl/dml (create, insert, delete etc with embedded queries) by
    checking for query_block info -- look for hash with 'query_block' before
    attempting table/col resolution. Need special type checking for these
    functions.
    refactor to common TreeWalker
    handle all pseudo cols
    most value expression stuff needs to migrate to XEval

TODO lib/Genezzo/PushHash/HPHRowBlk.pm
    fix synopsis

TODO lib/Genezzo/Row/RSExpr.pm
    SQLPrepare/SQLFetch: requires ALIAS argument, which doesn't make sense
    for rowsources like RSDual (see XEval). "Alias" is only necessary to
    disambiguate named columns.

TODO lib/Genezzo/Row/RSFile.pm
    need error handlers vs "whisper"

TODO lib/Genezzo/Row/RSIdx1.pm
    HSuck:
    FirstCount/NextCount: do real estimate vs fake
    should pass leftmost blockno explicitly versus rely on RSTab FIRSTKEY
    rectify some overlap between btHash and this module
    could encode multiple column key into single col rid using MIME::Base64
    encode of a packed row. should check dependency for perl 5.6 and add to
    Makefile.PL.

TODO lib/Genezzo/Row/RSTab.pm
    $href: remove - need a dict function to return allfileused via tso
    HSuck: need a way to specify packing method
    HSuck: fix trailing zero replacement
    NextCount: fix quitloop
    localPush/Store: qualify length packstr as percentage of blocksize
    (1/3?)
    localStore: race condition on rowstat
    localFetchDelete: frag flag info, delete status. Could express this
    function as a generalized "RowSplice" (as distinct from RDBlkA::HSplice,
    which is a block splice operator). Would need be able to splice based
    upon column number/array offset, as well as substring byte offset -- the
    inverse functionality of PackRow2/HSuck
    DBI - support Bind and projection (returning only certain specified
    columns, versus all columns)
    _init: change to use TSTableAFU support versus href->{filesused}
    need support for constraints that "mutate" supplied values, e.g.
    manipulate numeric precision or supply default values for columns. Also
    need support for foreign keys in delete.

TODO lib/Genezzo/SpaceMan/SMExtent.pm
    remove the seghdr/segnxt debugging tags
    need to coalesce adjacent free extents
    maintain multiple free lists for performance
    better indexing scheme - maybe a btree

TODO lib/Genezzo/SpaceMan/SMFile.pm
    read_only database support
    support for non-table objects like indexes - done?
    freetable: when last object is freed, need to update _tsfiles as UNUSED
    need to coalesce adjacent free extents
    maintain multiple free lists for performance
    better indexing scheme - maybe a btree
    chain the block header if necessary -- allocate a new block to hold
    additional free list information, append extent allocation to HEADER row
    (after 0:1)
    check status everywhere where update rows
    maintain free extents list for each object, so can re-use extents
    (especially important for updates of large multi-block rows)

TODO lib/Genezzo/SpaceMan/SMHook.pm
    better error handling
    better error handling

TODO lib/Genezzo/TSHash.pm
    SQLFetch: need to handle get_col_alias for filter?

TODO lib/Genezzo/Tablespace.pm
    filearr, used, unused: should match dict _tsfiles fileidx - done 3.21?
    notion of buffercache associated the tablespace object -- possible
    multiple active bc's, with different characteristics/semantics, e.g. a
    bc for temp space with different blocksize, lacking txn recovery? Need
    to guarantee that all clients of a tso use the same bc for
    consistency/locking/txn support
    use compatibility matrix to drive automatic upgrade capability

TODO lib/Genezzo/TestSetup.pm
    stuff

TODO lib/Genezzo/Util.pm
    Should bundle all data file utility functions, such as
    FileGetHeaderInfo, SetHeaderInfo, etc, under separate Util::DataFile
    module
    FileGetHeaderInfo: need to handle case of header which exceeds a single
    block. Probably should keep increasing the buffer size until find null
    terminator (within reason).
    packrow: store metadata in col0 vs trailing col with next ptr
    packrow: check pack format for a zero len row of zero cols. Does it need
    a nullvec?
    packrow/unpackrow: in Perl 5.8 could use the nifty repeating templates
    to our advantage.
    packrow: could generate skiplists as col zero metadata tracking byte
    position and column numbers to speed lookups

TODO lib/Genezzo/XEval.pm
    Should become more of a dispatch routine, with major guts for each
    function stashed in separate modules under XEval.
    SQLAlter, SQLInsert: move type checking to TypeCheck module.

TODO lib/Genezzo/XEval/Prepare.pm
    sql_where: function name processing -- drive from user_function, use
    type-checking functions.
    update pod
    need to handle FROM clause subqueries -- some tricky column type issues.
    explode STARs with column names - need consistent join table position
    check bool_op - AND purity if no OR's.
    check relational operator (comp_op, relop)
    handle ddl/dml (create, insert, delete etc with embedded queries) by
    checking for query_block info -- look for hash with 'query_block' before
    attempting table/col resolution. Need special type checking for these
    functions.
    refactor to common TreeWalker
TODO lib/Genezzo/XEval/SQLAlter.pm
    drop constraint
TODO lib/Genezzo/genexp.pl
    move most methods to separate .pm file
    need to distinguish "dictionary" havok routines vs post-dictionary havok
    tables

AUTHORS
    Copyright (c) 2005, 2006 Jeffrey I Cohen. All rights reserved.

        This program is free software; you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation; either version 2 of the License, or
        any later version.

        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.

        You should have received a copy of the GNU General Public License
        along with this program; if not, write to the Free Software
        Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

    Address bug reports and comments to: jcohen@genezzo.com

    For more information, please visit the Genezzo homepage at
    <http://www.genezzo.com>

