Newsgroups: rec.arts.int-fiction
Path: gmd.de!Germany.EU.net!mcsun!uunet!ddsw1!chinet!jorn
From: jorn@chinet.chinet.com (Jorn Barger)
Subject: Intro to fractal thicket theory
Message-ID: <C9JyJ2.vw@chinet.chinet.com>
Organization: Chinet - Public Access UNIX
Date: Fri, 2 Jul 1993 19:37:50 GMT
Lines: 259

[This document attempts to offer a self-contained tutorial on the 
semantic indexing problem in AI, and my own 'fractal thicket' proposal for 
its solution.]

The indexing problem in AI

When physicists talk about a grand, unified, "theory of everything," all 
they're really asking for is a precise description of strings and spinors 
and quarks.  This level of analysis will not be immediately useful to 
average people and their ordinary problems!  So it's fair to ask if there 
might be other valid approaches to a 'theory of everything', that give 
central place instead to the sorts of problems that humans really consider 
most important and useful.  A library's reference section always includes 
plenty of books that offer provisional solutions to this problem-- 
encyclopedias, thesauri, dictionaries of quotations... even libraries' own 
classification schemes, like the Dewey Decimal and Library of Congress 
schemes.

These are all 'theories of everything' of a very different sort.  (You 
can't usefully index library books according to the flavors of their 
quarks!)  But in general, the reference books that need such theories fall 
back instead on lists of alphabetized keywords, because every other scheme 
ever proposed has been fraught with (even worse) redundancies and 
uncertainties.  (Prototypical example:  books of quotations sorted into 
themes like Angels, Animals, Baseball, Capitalism, Conformity, Democracy, 
etc...)

Attempts in AI to automate translation between natural languages have 
foundered on these same shoals-- to translate a word you have to know which 
of several possible meanings is the relevant one in the given case, and to 
disambiguate those meanings you *need* an overarching classification of all 
the things words *can* mean.  Roget proposed an elaborate scheme almost two 
centuries ago, tagging 1000 categories of synonyms, but this is far too 
ad-hoc to build AI with, and very little progress has been made since, 
towards improving it.

And I think there's an extremely odd paradox at work here-- we may know how 
to distinguish a million shades of meanings in our words, and have no 
problem recognizing when a word is being misused, but it's damnably hard to 
articulate or classify exactly *what it is we know* that allows these 
subtle discriminations.  Doug Lenat's CYC project is trying to nibble the 
problem away by exhaustively annotating every concept in the world, but 
they've barely scratched the surface of the huge domain of psychological 
nuances-- feelings and emotions and desires, and even actions.  Human 
beings are just *bad* at this sort of self-knowledge, universally, though 
it's the most important element in every sort of communication.

So we have this huge, messy indexing-problem, *of vital importance*, and 
we're so handicapped by this (somewhat neurotic) blindspot in our self-
knowledge that we can barely even think or talk about it.

Where I've come from

I've spent the last 20 years trying to find new strategies for getting 
beyond that blindspot.  My greatest successes have involved analysis of the 
range of themes expressed in *literature*, because the literary tradition 
has always been concerned with precise description of psychological states.  
These researches have been given some added 'shape' by focusing on the 
immediate goal of programming a video-game, of the interactive-fiction 
genre, that would make use of some (yet-to-be-discovered) carefully 
articulated theory of story-plots and human emotions. (Prototype: George 
Polti's "36 Dramatic Situations"  See rec.arts.int-fiction for more FAQs.)

I spent the last three years (1989-92) learning the ins and outs of AI 
hacking at a lab where programming was seen as "inherently uninteresting," 
and programmers ignored at the bottom of a status hierarchy that included 
senior faculty, faculty, senior grad students, etc, all outranking us lowly 
coders, who were however the only ones who really had a grasp of how the 
programs worked!  Consequently, I found myself grappling with the most 
difficult problems in AI, fairly alone, and when in 1991 I found what 
seemed like a promising solution, I was outlawed from implementing or even 
presenting it, and then fired when I publicly drew attention to my plight.

Sigh...


The fractal-thicket indexing topology

My proposal includes a new data-topology, a way of interpreting that 
topology *semantically*, and a preliminary, *incomplete* set of semantic 
fillers for the first, experimental implementations to start from.

AI has been leaning heavily, all along, on the ancient and venerable 
*hierarchy* or tree topology.  More general topics are put at the top of 
the hierarchy (the root of the tree) and specialized step-by-step as you 
move down the hierarchy (up the tree ;^).

As an AI programmer, though, I was frequently confronted with hierarchy 
branches like 'Plans' or 'Events' that had 10 zillion different immediate 
specializations, or more often were left entirely unspecialized, with the 
footnote "to be filled in Real Soon Now".

Other branches, though, like 'Things', were much more tractable.  So I 
conceived the idea that those *ill-behaved* hierarchy-branches were 
inherently *complex* rather than simple, in that they could be analysed 
(broken down) into *sets* of the simpler elements.

So 'emigration' can be analysed as including, at least, one person and two 
places, and it should be indexed so that if you request all person-place-
place material, emigration is included near the top of the list!

But I hated the idea of having to search thru all ten zillion complex 
items, looking for matches to some pattern.  I wanted an indexing scheme 
that allowed you to step right *to* exactly those right items.  And what 
gradually came clear to me was that the *general* topology to allow this, 
has to place a *self-similar sub-hierarchy* under each node on the simple 
hierarchy, trees within trees within trees.

So if the toplevel of the tree offers person, place, and thing, then each 
of those will offer, along with their normal specializations, an 'added-
element subtree', also beginning with person-place-thing.  And when you 
navigate down two levels to the node 'person-then-place' you'll find all 
the person-place material, and also perhaps indentically at the node place-
then-person.

And I call this a "fractal thicket", because its branching factor is so 
dense.  And the first part of my proposal is to just contemplate and 
investigate this topology, ignoring all details of the particular labels on 
the simple hierarchy.  (Take *any* hierarchy, and do the thought experiment 
of mapping its elements over itself, to see what new semantic classes 
emerge.)

There's a strong parallel here to mapping a simple 1-D array across itself 
repeatedly, creating a multi-dimensional array.  Fractal thickets work just 
the same way, mapping trees across trees.

It may seem like this is a perfect recipe for an uncontrolled combinatorial 
explosion, instantly clogging memory with an only-sparsely-meaningful 
structure.  But imagine that each branch might be a simple array of 
pointers to similar arrays, and realize you don't have to instantiate a 
branch on a subtree if it's meaningless, so the *worst-case* overhead is 
really just the size of that array times the number of steps it is deep, 
which is not bad at all.  And even better, you can assume that each of 
those steps probably stored useful data of its own, info about that 
slightly-less-specialized set of simple elements, so there's really almost 
*no* wasted overhead, and the dense combinatorial explosion will expand 
exactly as far as needed, and no further.  If it's got a few orders of 
magnitude above a million elements, so be it.  That's the best we can hope 
for, for natural language!

Fractal-thicket semantics

The immediate semantic challenge that 'falls out' of this topology is to 
consider your toplevel simple elements in pairs and trios, etc, and ask 
what sorts of *relationships* do these types of elements usually display?

So persons and things may demand:

person makes thing
person acquires thing
person uses thing
person maintains thing
person destroys thing
person disposes of thing

The simple traditional person-place-thing categories now suddenly generate 
a rich set of relationship-primitives, which bear a striking resemblance to 
the classic verb-sets used in adventure games (still the state-of-the-art, 
for interactive fiction).

It seems heuristically promising to sort each such set of primitives into a 
simple, universal chonology of its own, as in this example.  (Thought 
experiment: Again, take your favorite hierarchy, and ask:  for each pair of 
elements, how do relationships between these two classes usually begin?  
What then happens next?  How do such relationships usually end?)

A natural next step is to take these relationships in pairs, and link them 
with temporal-causal connectives like

person goesTo place
then
person acquires thing

person uses thing1
enables
person acquires thing2

A user-interface for such an indexing scheme should offer first the top 
level of the simple hierarchy, which can then be navigated to any degree of 
specialization, *and then* an identical toplevel menu for the first 
"complication", which can again be specialized, and then a *choice* of 
another complication *or* a menu of relationships for the current complex 
set, and then a way to sequence these relationships, etc.

Preliminary semantic fillers

Here's a hierarchy of simple elements.  It needs lots of work.

element
   motive
      hunger
      safety
      sex
      esteem
      family
      self-expression
   thing
      food
      tool
         container
         vehicle
         clothing
         weapon
      bodypart
      waste
   place
      inside
         home
            livingroom
            kitchen (etc)
         school
         office
      outside
         yard
         street (etc)
   person
      gender
      age
      species
      role?
   modality
      emotion
      belief
      etc

Person, place, and thing here are quite familiar, and are, as well, 
universal in adventure games.  Motive and modality are original proposals 
that I think promise great payback, but must require persistence and 
discipline to explore.  Many interesting human psychological (and *social*) 
phenomena involve an exchange or tradeoff relationship between two motives.  
Here's where the 'blindspot' kicks in with a vengeance!

What I'm calling modalities normally involve a relationship between a 
person *and another relationship*.  So "person desiresThat (person acquires 
thing)" can be parsed in the same way as "person believes (person isAt 
place)".

I've tested this system on the last 150 categories in Roget (figuring they 
were the hardest) and the results are impressive but need polish.  (Write 
me if you can't wait, I guess  ;^)


Some basic two-element relationships

person thing: makes acquires uses maintains destroys disposes
person place: bornAt movesTo occupies movesFrom diesAt
person motive: suffers gratifies indulges abstains denies
relationship relationship: causes enables precedes becomes
person relationship (ie modality): imagines wants believes fears believes-
deserves
person person (usually with a 3rd element): begets kinTo marries harms 
helps conflictsWith cooperatesWith communicatesWith hasStatusOver
motive motive: tradeoff exchange conflict substitute
thing thing: hasPart attachedTo
person skill: discovers learns practices uses forgets

Three-element relationships are most interesting when two of the elements 
are persons, like person-gives-thing-to-person.

