proxy70

This is documentation for the ASSOCIATION.4TH file. The code is intended to be a supplement to the NOVICE.4TH file. The novice should first read NOVICE.TXT and understand the code in novice.4th. You don't have to understand every detail though, as ASSOCIATION.4TH mostly only uses the FIELD word from novice.4th. For symbol tables, you should use SYMTAB.4TH that uses an algorithm of my own invention. It assumes that some nodes will be more commonly accessed than other nodes, and moves the more common ones closer to the root.

*******************************************************************************
*******************************************************************************
*******************************************************************************

This code is Forth's answer to the association data-structure provided in AWK (and Lua, although I don't know enough about Lua at this time to comment on that). The idea is that you have something similar to an array except that it is indexed by non-integer data (such as strings or floats) and it is sparse. I use a left-leaning red-black tree (LLRB) internally. I got the algorithm from these articles:
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf 
http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf

At the end of the ASSOCIATION.4TH file, there is some demonstration code involving a LANGUAGES association. That will be used as a reference throughout this documentation.

When defining an association, the user must define a data type for the nodes in the tree, with ELEMENT as its parent --- and the user must also define a data type for the handle of the tree, with ASSOCIATION as its parent. In our example code, LANGUAGE is the data type for the node in the tree, which associates a key (the language name) with some data about the language (who invented it, and what platform(s) it is primarily used on). LANGUAGES is the data type for the handle of the tree, which includes a pointer to the root of the tree, and some xt values needed for processing the nodes in the tree. Here is how LANGUAGE is defined:

element                     
    w field .inventor
    w field .platform
constant language           \ describes a programming language

: init-language (  inventor platform name node -- node )
    init-element >r
            r@ .platform !
    hstr    r@ .inventor !
    r> ;

: new-language ( inventor platform name -- node )
    language alloc
    init-language ;

This is the same manner of defining data types that was used in LIST.4TH earlier. There was a lot of documentation on data types in LIST.TXT, and some examples of derived data types, so the reader can refer to that for basic ideas regarding data types. We have an INIT-LANGUAGE constructor, and a NEW-LANGUAGE definer. These have to be separate words, because a derived data-type INIT function will need to be able to call INIT-LANGUAGE. It can't call NEW-LANGUAGE because NEW-LANGUAGE allocates memory, and the memory will have already been allocated.

In regard to our ELEMENT derivatives, such as LANGUAGE, we need to write a couple of helper functions in the case that our node contains any "attachments" (pointer to heap memory). Our LANGUAGE record does contain an attachment (the .INVENTOR field) in its payload, so these are needed in this case.

: kill-language-attachments ( node -- )
    dup .inventor @  dealloc 
    kill-key ;
    
: copy-language-attachments ( src dst  -- )    
    over .inventor @  hstr
    over .inventor !
    copy-key ;
    
We need a function that kills all the attachments. KILL-LANGUAGE-ATTACHMENTS does this. It ends with a call to KILL-KEY. The .KEY field is an attachment that every ELEMENT derivative has, so we always need KILL-KEY. If we didn't have any attachments in our payload, then we could have just used KILL-KEY. We did have an attachment in our payload though, so we had to write our own KILL-LANGUAGE-ATTACHMENTS function.

COPY-LANGUAGE-ATTACHMENTS copies all of the attachments from one node (src) to another node (dst). This function ends with a call to COPY-KEY, which is always needed. If we didn't have any attachments in our payload, then we could have just used COPY-KEY. 

Here is how our handle LANGUAGES is defined:

association                 
constant languages          \ describes a set of languages

: init-languages ( record -- record )
    >r
    ['] compare  ['] kill-language-attachments  ['] copy-language-attachments
    r> init-association ;

: new-languages ( -- record ) 
    languages alloc
    init-languages ;

Our LANGUAGES data type doesn't have any fields beyond what ASSOCIATION has. Most data types derived from ASSOCIATION won't have any fields of their own. Essentially all that we are doing here is initializing an ASSOCIATION record with xt values specific to our LANGUAGE nodes.

The KILL-LANGUAGE-ATTACHMENTS and COPY-LANGUAGE-ATTACHMENTS functions have already been discussed. We also need a function that compares .KEY fields between nodes. Our .KEY field in our LANGUAGE nodes is a counted string, so we can just use COMPARE. Sometimes you may want to use a key that is not a counted string. For example, you may want to use floats. In this case, use FLOAT-COMPARE rather than COMPARE as your comparison function. Also, use FLOAT>HSTR to make heap-strings out of floats, suitable for storage in the .KEY field.

Here is how an association is built up:

: some-languages ( -- handle )
    new-languages >r
    c" Moore, Chuck"            micro-controller                        c" Forth"       new-language  r@  insert
    c" Ichiah, Jean"            micro-controller                        c" Ada"         new-language  r@  insert
    c" Wirth, Niklaus"          MS-DOS                                  c" Pascal"      new-language  r@  insert
    c" Wirth, Niklaus"          Linux                                   c" Oberon"      new-language  r@  insert
    c" McCarthy, John"          Linux                                   c" Lisp"        new-language  r@  insert
    c" van Rossum, Guido"       Linux                                   c" Python"      new-language  r@  insert
    c" Gosling, James"          cellphone browser or                    c" Java"        new-language  r@  insert
    c" Brazil"                  Linux                                   c" Lua"         new-language  r@  insert
    c" Matsumoto, Yukihiro"     Linux                                   c" Ruby"        new-language  r@  insert
    c" Pestov, Slava"           Linux                                   c" Factor"      new-language  r@  insert
    c" Gosling, James"          cellphone browser or Linux or           c" Java"        new-language  r@  insert
    c" Ritchie, Dennis"         micro-controller MS-DOS or Linux or     c" C"           new-language  r@  insert
    c" Stroustrup, Bjarne"      MS-DOS Linux or                         c" C++"         new-language  r@  insert
    r> ;

This function creates a new association and returns its handle. It also fills it with some nodes. Data about a language is associated with each language name. This allows us to query our "database" for information about languages. For example, if you have never heard of Forth (who has?), you could look it up like this:

some-languages  constant DB  ok
c" Forth"  DB  find-key  ok
dup show-language
name:         Forth
inventor:     Moore, Chuck
platform:     micro-controller
 ok
 
I put the handle returned by SOME-LANGUAGES in the constant DB (as in database) so we could use it in our examples. FIND-KEY returns the node in the tree that matches the key that it is given, or returns FALSE if no match is found (for example if you spelled the key as "forth" rather than "Forth"). SHOW-LANGUAGE just prints out all of the data in the node.

The alert reader will notice that Java was include twice in SOME-LANGUAGES. When there are duplicates like this, the latest version replaces the previous version. If we look up Java, this is what we get:

c" Java"  DB find-key  ok
dup show-language
name:         Java
inventor:     Gosling, James
platform:     cellphone browser Linux
 ok

We can delete a node like this:
 
c" Python"  DB delete  ok
 
It is possible to access the fields in the record individually, like this:

c" Forth" DB find-key  ok
dup .inventor @ count type Moore, Chuck ok
dup .key @ count colorless type Forth ok

Notice that the key string needed the COLORLESS function applied to its count. This is because the tree has red/black color information embedded into the key count internally. COLORLESS removes this information so the string can be treated like a normal string. This is very easy to forget! If you forget your COLORLESS and try to type out the string, it may work or it may display garbage, but you won't get any error message of any kind. It is an awkward imposition on the user, but I didn't want to dedicate an entire field just for a boolean, so I put the color information in the key count. On a related note, your key string can only be 127 characters long, rather than the usual 255. If you try to create a node with a key string that is longer than 127 characters, the program will abort with an informative error message.

It is okay to modify any of the fields, except the .KEY field. The tree is sorted by the .KEY field, so modifying this one will wreck the entire structure. If you want to modify the .KEY field, you have to make a duplicate node and modify it:

c" Lisp"  DB find-key  ok
DB dup-element  ok
c" Scheme/Lisp" hstr  over .key !  ok
DB insert  ok

The DUP-ELEMENT function makes a new node with the same payload as an old node. It can be inserted into the tree, in which case it will replace the old node because it still has the same .KEY field as the old node. In this example however, we modify the .KEY field before inserting it. We need to use HSTR because the .KEY string has to be in the heap. We then use INSERT to insert the new node into the association. The old node (with a key of "Lisp") is still in the association. If we wanted to get rid of it, we could have used DELETE at any time after DUP-ELEMENT had been used to make a copy of it.

It is possible to merge one association into another (a logical-or of the associations):

: all ( src-handle dst-handle -- dst-handle )       \ merges everything in the src association into the dst association

If there are any keys common to both the src and dst associations, the src node will trump the dst node.

It is also possible to create a new association that is a copy of an old association:

: copy-association ( handle -- new-handle )    
    dup similar-association  all ;

The SIMILAR-ASSOCIATION function that is used in COPY-ASSOCIATION makes new association that is like the old association (it has the same xt values), but it is empty of nodes.

The ALL function copies all of the nodes in the src association. It is also possible to just copy those nodes that are within a particular range. For example, lets copy all of the nodes from C to Go:

c" C"  c" Go"  DB filter within  ok
dup .root @ show-tree
            B:Forth
                        r:Factor
B:C++
            B:C
 ok

The FILTER function takes two strings and a handle and returns data suitable for WITHIN to use. WITHIN creates a new association that contains only the nodes whose keys are within (inclusive) the given range. In this case, we wanted all the languages from C to Go. We don't have Go in our database of languages however, so we only got C, C++, Factor and Forth. The SHOW-TREE function was written when I was debugging the software, but I used it here to show all of the nodes in the tree.

We can also make a new association that contains all of the nodes exclusive to the range:

c" C"  c" Go"  DB filter without  ok
dup .root @ show-tree
                        B:Scheme/Lisp
            B:Ruby
                                    B:Python
                        r:Pascal
                                    B:Oberon
B:Lua
                        B:Lisp
            B:Java
                        B:Ada
 ok

It is possible to use WITHIN and WITHOUT to merge the selected nodes from an association into an already existing association. FILTER is written like this:

: filter ( lower-str upper-str handle -- lower upper handle new-handle ) 
    >r
    swap r@ find-lower  
    swap r@ find-upper
    r@  r> similar-association ;

You will notice that we used SIMILAR-ASSOCIATION for our new-handle. It is possible to write code like this yourself, but to use the handle for an already existing association. FIND-LOWER finds the lowest node in the association whose key is yet >= to the target string. FIND-UPPER finds the highest node in the association whose key is yet <= to the target string. FIND-UPPER was give the string "Go" and it found "Forth" --- the highest node alphabetically that was yet <= to "Go." FIND-LOWER and FIND-UPPER may return NIL if there is no such node. For example:

c" Xylophone"  DB  find-lower . 0  ok
c" Abba"  DB  find-upper . 0  ok

Normally, WITHIN and WITHOUT should be given a lower node that is <= to the upper node. It is possible to use a lower node that is > than the upper node, but the result doesn't really make any sense:

c" Forth"  c" C"  DB  filter within  ok
dup .root @ show-tree
                        B:Scheme/Lisp
            B:Ruby
                                    B:Python
                        r:Pascal
                                    B:Oberon
B:Lua
                        B:Lisp
            B:Java
                        B:Forth
 ok
 
c" Forth"  c" C"  DB  filter without  ok
dup .root @ show-tree
            B:Factor
                        r:C++
B:C
            B:Ada
 ok

It is possible to traverse through the entire tree and perform an operation of some kind on each node:

: show-languages ( handle )
    .root @  ['] show-language  walk> ;
        
This can be used like this:

DB show-languages

The WALK> function takes a tree root and the xt of a function that will perform an operation on a node, and applies that function to every node in the tree. WALK> traverses the tree in order. We also have <WALK that traverses the tree in reverse order.

It is possible to use <WALK or WALK> to make a custom filter. For example:

: <filter-inventor> { inventor handle new-handle node -- inventor handle new-handle }
    inventor count  node .inventor @ count  compare  A=B = if
        node handle dup-element  new-handle insert  then
    inventor handle new-handle ;
    
: filter-inventor ( inventor handle -- new-handle )    
    dup similar-association                             \ -- inventor handle new-handle 
    over .root @  ['] <filter-inventor>  walk>          \ -- inventor handle new-handle
    nip nip ;

Our FILTER-INVENTOR function traverses the entire tree and filters out all of the nodes whose .INVENTOR string matches a certain string, and puts copies of those nodes in a new association. The function that touches each node must consume the node address from the top of the stack. It can access data on the stack under the node address, but it must leave that data in place.

Here is an example of making a new association containing only Niklaus Wirth's work:
    
c" Wirth, Niklaus"  DB filter-inventor  ok
dup show-languages
name:         Oberon
inventor:     Wirth, Niklaus
platform:     Linux

name:         Pascal
inventor:     Wirth, Niklaus
platform:     MS-DOS
 ok
 
When you are through with an association, you should kill it so you don't have a memory leak:

DB kill-association  ok

Don't try to use this association after you kill it, or your program will crash.