proxy70

This is documentation for the LIST.4TH file. The code is intended to be an easy supplement to the NOVICE.4TH file. The novice should first read NOVICE.TXT and understand the code in novice.4th. You don't have to understand every detail though, as list.4th mostly only uses the FIELD word from novice.4th.

The list.4th program implements singly-linked nil-terminated linked lists. This was originally inspired by Factor, and was intended to be Forth's answer to Factor's sequences.

The list.4th file contains a couple of data structures that illustrate how the lists are intended to be used. These data structures are also useful in their own right; they are SEQ, which are lists of strings, and DECOMMA, which are lists of strings split along delimiters (usually commas).

Most of this has both ANS-Forth and assembly-language (SwiftForth) versions. Use TO to set SwiftForth? to TRUE to enable
assembly-language, or FALSE to remain ANS-Forth compatible.


*******************************************************************************
*******************************************************************************
*******************************************************************************

A list is defined like this:

list
    w field .line
constant seq

: init-seq ( str node -- node )
    init-list >r
    hstr    r@ .line !
    r> ;

: new-seq ( str -- node )
    seq alloc
    init-seq ;

Here, SEQ is a child data-type of LIST. There should always be an INIT-xxx and NEW-xxx function for the list. You should not do the initialization of the record in the NEW-xxx function. You need to have an INIT-xxx function because there might be a child of this data type, and its INIT-xxx function will need to call this INIT-xxx function. For example:

seq
    w field .head       \ a SEQ of the .LINE string split on comma deliminators
constant decomma

: init-decomma ( str node -- node )
    init-seq >r
    r@ .line @ count split  r@ .head !
    r> ;

: new-decomma ( str -- node )
    decomma alloc
    init-decomma ;

Here we have a child data-type called DECOMMA whose parent is SEQ. This data type will have a .LINE field that it inherits from SEQ, and will also have a .HEAD field of its own. In this case, .HEAD is a SEQ list. The SPLIT function splits a string on comma deliminators to form a SEQ list. 

Note that in an earlier version of LIST.4TH, it was necessary to call INIT-LIST after calling INIT-SEQ. This is no longer necessary

The function TEST reads a test file off of the disk into a seq list. TEST is defined like this:

: test ( -- head )        
    c" test.seq" read-seq ;
    
The test file contains comma-deliminated line-record data. The first field is the name of a programming language, the second field is a description of what the language is notable for, and the third field is the name of the principle inventer of the language. If more than one person was involved, their names will be in the fourth and fifth etc. fields. This data can be used for testing the list code. The function SEQ>DECOMMA will convert the SEQ list into a DECOMMA list, and the function DECOMMA>SEQ will convert the DECOMMA list back into a SEQ list giving it bar delimiters rather than comma delimiters. 


*******************************************************************************
*******************************************************************************
*******************************************************************************

You use NEW-xxx to create a new node of your data type. You then use LINK to append or prepend that node on an existing list:

: link ( 1stHead 2ndHead -- NewHead )                           \ makes one list

If 2ndHead is the new node, then you are appending; if 1stHead is the new node, then you are prepending. Most of the time, you will be appending or prepending a single node (usually freshly created by NEW-xxx) onto a list. It is possible however, to append or prepend an entire list onto another list.

Here is how READ-SEQ is defined:

: read-seq ( name -- head )
    count r/o <open-file> >r                                    \ r: -- file-id
    <cstr  nil begin                        \ -- head
        cstr 1+  cstr-size 1-  r@  read-line  abort" *** READ-SEQ failed to read line ***"
        while                               \ -- head cnt
            cstr c!                         \ -- head
            cstr  new-seq                   \ -- head node
            link  repeat drop               \ -- head
    r> <close-file>  
    cstr> drop ;

We have a NIL in front of the BEGIN. That is the head of the list that we are creating. Later on, when LINK is called, that will be the 1stHead. The 2ndHead is provided by the NEW-SEQ just prior to the LINK. After LINK has done its work, it returns a NewHead result, which will be the 1stHead the next time we go through the loop.

In addition to appending and prepending a node onto a list, it is also possible to insert a node (or an entire list) into the middle of another list:

: insert ( 1stHead 2ndMiddle -- )               \ insert 1st list into 2nd list after 2ndMiddle node

The 1stHead list is the one being inserted, and it gets inserted into the second list after the 2ndMiddle node.

It is also possible to break a list into two pieces:

: delink ( 1stHead 2ndHead -- 1stHead 2ndHead )                 \ makes two lists

: delink-one    ( head -- head rest )   dup 1st     <delink> ;
: delink-two    ( head -- head rest )   dup 2nd     <delink> ;
: delink-three  ( head -- head rest )   dup 3rd     <delink> ;
: delink-four   ( head -- head rest )   dup 4th     <delink> ;
: delink-five   ( head -- head rest )   dup 5th     <delink> ;
: delink-six    ( head -- head rest )   dup 6th     <delink> ;
: delink-seven  ( head -- head rest )   dup 7th     <delink> ;
: delink-eight  ( head -- head rest )   dup 8th     <delink> ;
: delink-nine   ( head -- head rest )   dup 9th     <delink> ;

The 2ndHead node should be a node in the 1stHead list. DELINK will split the list into two lists at this spot. Oftentimes, you will have found the 2ndHead node using FIND-NODE. You can also find a node a certain distance into the list using 1ST, 2ND, etc.. We have DELINK-ONE, DELINK-TWO, etc., that are used for delinking a certain number of nodes from the front of a list.

There is an item called SAFE? that is set to TRUE as a default. When this is TRUE, some run-time error checking is done. LINK assures that the 2ndHead node is not in the 1stHead list. DELINK assures that the 2ndHead node is in the 1stHead list. After your program is debugged, you will want to set SAFE? to FALSE, as this error checking slows down the program. In many programs, LINK is used quite a lot; if you profiled the program, you would find a significant amount of time being spent in LINK.

It is possible to create a sorted list using INSERT-ORDERED:

\ comparer: i*x new-node node -- j*x new-node ?             \ insert new-node prior to node?

: insert-ordered ( head 'comparer node -- new-head 'comparer )

: sort-list ( head 'comparer -- new-head )
    nil swap  rot each[  insert-ordered  ]each drop ;

This is how we create sorted lists. If you are creating a list, and you want it to be sorted, then you use INSERT-ORDERED rather than LINK to insert each new node into the list. If you have an unsorted list and you want it to be sorted, then you use SORT-LIST to make a new list. In either case, you will need a comparer function. Here is an example:

: <sort-seq> ( new-node node -- new-node ? )
    .line @ count  rover .line @ count  compare  1 = ;
    
: sort-seq ( head -- new-head )
    ['] <sort-seq> sort-list ;

In order to get an ascending sort, the comparer function must return a TRUE flag if the node is greater than the new-node. The SORT-SEQ function sorts the list by the .LINE data. SORT-SEQ works on both SEQ and DECOMMA lists.


*******************************************************************************
*******************************************************************************
*******************************************************************************

When lists are large, the speed problems of NTH and SORT-SEQ can become a problem. This may be solved by converting the list into an array. 

: list>array ( head -- array elements )     \ the array is of pointers to the nodes \ it is allocated on the heap
    
: array>list ( array elements -- head )     \ array is assumed to be on the heap \ it is deallocated
            
macro: array-nth ( array n -- node ) 

The LIST>ARRAY function creates an array of pointers. The pointers are to the nodes in the list. The programmer can use ARRAY-NTH to access the nodes, and it is much faster than NTH. It is possible to rearrange the elements (pointers) in the array. For example, SORT can be used to sort the array. Afterward, ARRAY>LIST can be used to convert the array (possibly rearranged) back into a list. The programmer should note that the list has been in memory the whole time. The pointers are to the nodes, and they are right where they have always been. The ARRAY>LIST function modifies the .FORE fields of the nodes so they reflect the new arrangement (if there is a new arrangement).

Here is an example:

: <big-sort-seq> ( adrX adrY -- X>Y? )   
    >r  @ .line @ count
    r>  @ .line @ count  compare  A>B = ;
    
: big-sort-seq ( head -- new-head )         \ more efficient for large arrays
    list>array                              \ -- array elements
    2dup  w  ['] <big-sort-seq>  sort
    array>list ;
        
The programmer needs to keep in mind that the array contains pointers to the nodes, rather than the nodes themselves. You need a @ to access the nodes. This can be seen in <BIG-SORT-SEQ>. The adrX and adrY parameters are addresses of pointers to nodes, not addresses of nodes, so we have a @ to obtain the node's address, and then a .LINE @ to obtain the string address that is the node's data payload. Forgetting about this extra level of indirection is (for me) the most common bug that occurs. Often you will write your code to use the lists, and to use NTH and SORT and so forth. Later on, when you discover that this is slow, you upgrade to use arrays as an intermediate data structure. Porting the code from lists to arrays is not as straight-forward as you might hope, because you have this extra level of indirection to deal with. Also, the toucher that is used by SORT has a different stack picture than the toucher that is used by SORT-LIST, and this is something else that you have to be alert to.


*******************************************************************************
*******************************************************************************
*******************************************************************************

There are two ways to traverse through a list. The preferred method is EACH:

: each ( i*x head 'toucher -- j*x )             \ toucher: i*x node -- j*x

The HEAD parameter is a pointer to the head of the list and the 'TOUCHER parameter is a pointer to a function that will "touch" each node in the list. This function takes one parameter, which is the pointer to the node that it is processing. Here is a simple example of how to use EACH:

: <kill-seq> ( node -- )
    dup .line @  dealloc
    dealloc ;

: kill-seq ( head -- )
    each[  <kill-seq>  ]each ;

We also have EACH[ and ]EACH, which are defined like this:

macro: each[                                    \ toucher: i*x node -- j*x
    begin  ?dup while  
        dup .fore @ >r ;
    
macro: ]each
        r> repeat ;

Our KILL-SEQ defined above could have been written like this;

: kill-seq ( head -- )
    each[  
        dup .line @  dealloc
        dealloc 
        ]each ;

It is best to have the toucher be factored out into a function, such as <KILL-SEQ> above, because it may be needed by a child data-type toucher. For example:

: <kill-decomma> ( node -- )    
    dup .head @  kill-seq
    <kill-seq> ;
    
: kill-decomma ( head -- )    
    each[  <kill-decomma>  ]each ;

Whether you use EACH or EACH[...]EACH is up to you. On the plus side, EACH[...]EACH is faster executing than EACH. On the negative side however, EACH[...]EACH is fragile code in the sense that it is not compatible with the use of local variables or >R...R>. If the code between EACH[ and ]EACH should access a local variable, or should use R@ to access data on the return stack, chaos will result. This fragility is more a problem with the ANS-Forth standard than a problem with my software, but so long as we continue to use ANS-Forth there is not much that can be done about it.

If the toucher needs to access data other than the node, the parameter stack is used. Here is an example:

: <count-C> ( count node -- new-count )                     \ increment count if the first field contains a capital C
    .head @ .line @ count  s" C"  search nip nip  if  1+  then ;
    
: count-C ( head -- count )
    0  
    swap  ['] <count-C> each ;
    
Here the toucher <COUNT-C> has a count parameter that it modifies. Notice that <COUNT-C> still consumes the node just like any EACH toucher. The stack underneath is only modified, but <COUNT-C> doesn't push new data on the stack or pop any off. 

COUNT-C could have also been written like this:

: count-C ( head -- count )
    0  
    swap each[  
        .head @ .line @ count  s" C"  search nip nip  if  1+  then   
        ]each ;
        
Or like this:        
    
: count-C ( head -- count )
    0  
    swap each[  <count-C>  ]each ;
    
A good point that can be made, is that languages such as C are not capable of doing this because their type-checking system would get in the way. In C, the pointer to the toucher would need a typedef, and this typedef would need to describe exactly what parameters the toucher consumes and returns --- that is to say, that it takes a pointer to a node and returns nothing. You wouldn't be able to have some touchers accessing data underneath the node on the parameter stack, and all of these touchers being called by EACH. The way this problem would be circumvented would be to pass the toucher a void pointer to a struct, and to have the parameter(s) in the struct. This struct would contain the pointer to the node and possibly some other parameters. This is complicated and ugly code; by comparison, the Forth code is simple and elegant.

It is possible for the toucher to remove nodes from the list. You can write a filter that passes all of the nodes through, but filters out certain nodes:

: <purge-C> ( head node -- new-head )                       \ remove any node in which the first field contains a capital C
    dup .head @ .line @ count  s" C"  search nip nip  if        remove  <kill-decomma>  
    else                                                        drop                        then ;
    
: purge-C ( head -- new-head )
    dup  ['] <purge-C> each ;

In this case, <PURGE-C> modifies the head parameter underneath the node parameter that it consumes. REMOVE is defined like this:

: remove ( head node -- new-head node )         \ returns new list without node and makes node a sovereign list

This definition is changed from what was in the previous list.4th package. REMOVE is now able to remove any node from the list, including the head node. In <PURGE-C>, the removed node is given to <KILL-DECOMMA> to free up its memory. It is also possible for the removed nodes to be collected rather than destroyed:

: <collect-C> ( nil nodes... head node -- nil nodes... new-head )       \ remove and collect the matching node
    dup .head @ .line @ count  s" C"  search nip nip  if        remove  swap
    else                                                        drop                        then ;
    
: collect-C ( head -- nil nodes... new-head )               
    nil swap
    dup  ['] <collect-C> each ;
    
In this case, what we have on the stack is a variable number of nodes, with a NIL marker underneath them. Earlier I said that the toucher just modifies what is on the stack under the node, but doesn't put any new data on or take any data off. That is essentially true even with <COLLECT-C>, it is just that what is on the stack under the node is a variable amount of data. Having the NIL marker underneath makes it effectively the same data throughout.

EACH (and EACH[...]EACH) is used for touching every node in the list. We can also search for a node that matches some criteria and return that node or the node prior to it:

: find-node ( i*x head 'toucher -- j*x node|false )             \ toucher: i*x node -- j*x flag

: find-prior ( i*x head 'toucher -- j*x -1|node|false )         \ toucher: i*x node -- j*x flag

In this case, the toucher not only consumes the node, but also returns a flag indicating if it is the node we are searching for or not. If the toucher returns a TRUE, then the list traversal stops and no further nodes are touched. FIND-NODE returns the node that matched the criteria. FIND-PRIOR returns the node prior to the node that matched the criteria, or -1 if the node that matched the criteria was the head of the list. If FIND-NODE or FIND-PRIOR never find a node that matches the criteria, then they return FALSE.

FIND-NODE and FIND-PRIOR are similar to EACH in that the toucher can access data on the parameter stack under the node. They can also use REMOVE to remove nodes if necesary. Here is an example:

: <first-C> ( node -- ? )    
    .head @ .line @ count  s" C"  search nip nip ;
    
: first-C ( head -- node|false )    
    ['] <first-C> find-node ;    
    
In this case, we stop our search on the first node that matches our criteria. 

FIND-PRIOR is similar except that it returns the prior node, or -1 if the found node was the head of the list. FIND-PRIOR is not used by programmers very much; I've never used it in an application program. It is primarily provided for internal use in INSERT-ORDERED.


*******************************************************************************
*******************************************************************************
*******************************************************************************

It is possible to make clones of a node or a whole list:

: clone-node ( node -- new-node )               \ returns a list with only one node in it

: clone-list ( head -- new-head )
    nil  swap each[  clone-node link  ]each ;    

If you use EACH to modify some or all of the nodes in a list, but you also want to keep a copy of the original version of the list, you can use CLONE-LIST first and them make your modifications to the clone. Note that CLONE-NODE only does a simple copy of the node. If you have fields which are pointers to records in the heap, then these have to get explicitely cloned; you don't want your clone node to be using the same records as your original node. In this case, you have to write your own CLONE-xxx function:

: <clone-seq> ( node -- new-node )    
    clone-node
    dup .line @  hstr  over .line ! ;
    
: clone-seq ( head -- new-head )    
    nil
    swap  each[  <clone-seq> link  ]each ;

Notice that <CLONE-SEQ> makes a clone of the node, but doesn't link it to the head. The LINK is provided after <CLONE-SEQ> in the EACH[...]EACH loop of CLONE-SEQ. The reason why this is done, is so <CLONE-SEQ> can be used in a child data-type's cloning function:

: <clone-decomma> ( head node -- new-head )    
    <clone-seq>
    dup .head @  clone-seq  over .head ! ;
    
: clone-decomma ( head -- new-head )    
    nil
    swap  each[  <clone-decomma> link  ]each ;
    
This is one case in which EACH[...]EACH really is preferrable to EACH. The <CLONE-xxx> function can't be given to EACH because it still needs a LINK. With EACH[...]EACH, both the <CLONE-xxx> function and LINK can be put in there.

In addition to CLONE-NODE and CLONE-LIST, we also have CONCRETE-NODE and CONCRETE-LIST. These are similar except that, instead of creating a new node or list in the heap, they create the new node or list in the dictionary. If you kill a concrete list, FREE will recognize that it is in the dictionary rather than the heap, and will do nothing. The use of concrete lists is recommended for moving work from run-time to compile-time. This was done in the slide-rule program for the MARK lists (D-scale etc.).


*******************************************************************************
*******************************************************************************
*******************************************************************************

Finally we have some miscelaneous functions:

: reverse ( head -- new-head )      \ reverses the order of all the nodes
    
macro: lone? ( node -- ? )          \ is this node the only node in the list?

: in-list? ( head node -- ? )       \ is the node in the list?

: length ( head -- count )
                
: tail ( head -- node )

macro: nth ( head n -- node )       \ N is an index as for an array \ returns nil if N is >= to the length

These are self explanatory. The NTH index is described "as for an array," which means that 0 is the first node, 1 is the second node, etc..
    
    
*******************************************************************************
*******************************************************************************
*******************************************************************************

It is possible to use lists as queues and stacks:

macro: enqueue ( node head -- new-head ) 

macro: dequeue ( head -- node new-head )    

macro: push ( node head -- new-head )

macro: pull ( head -- node new-head )    

A queue is a first-in-first-out (FIFO) structure, and a stack is a last-in-first-out (LIFO) structure.