\ list.4th --- written by Hugh Aguilar --- copyright (c) 2009, BSD license \ Compiles under any ANS-Forth system (16-bit, 32-bit or 64-bit). \ This is an implementation of singly-linked nil-terminated lists. \ See the file LIST.TXT for further documentation. \ Author contact: hughaguilar96@yahoo.com \ We need NOVICE.4TH already compiled. \ If only a few functions can be rewritten in assembly language, the best candidates would be EACH and TAIL. marker list.4th false value safe? \ should we compile in error-checking code? \ not really necessary most of the time, but can be turned on for debugging \ ****** \ ****** The following are basic list words. \ ****** 0 w field .fore \ .FORE will be fast because it is in the front constant list SwiftForth? [if] icode init-list ( node -- node ) zero(eax) eax 0 [ebx] mov ret [else] macro: init-list ( node -- node ) nil over .fore ! ; [then] macro: lone? ( node -- ? ) \ is this node the only node in the list? .fore @ 0= ; SwiftForth? [if] code in-list? ( head node -- ? ) \ is the node in the list? ebx eax mov [drop] begin zero? if ret then ebx eax cmp 0= if true # ebx mov ret then [@] again end-code [else] : in-list? ( head node -- ? ) \ is the node in the list? swap begin \ -- targ node dup 0= if 2drop false exit then 2dup = if 2drop true exit then .fore @ again ; [then] \ ****** \ ****** The following are for searching through lists. \ ****** SwiftForth? [if] icode next>r ( node -- node ) \ r: -- next-node 0 [ebx] push ret end-code [else] macro: next>r ( node -- node ) \ r: -- next-node dup .fore @ >r ; [then] macro: each[ \ toucher: i*x node -- j*x begin dup while next>r ; macro: ]each r> repeat drop ; \ EACH[...]EACH generates faster-executing code than EACH, and can be more readable, but the code is more bloated. \ EACH[...]EACH uses the return stack. This can create havoc in functions that use locals or use >R and R>. \ EACH[...]EACH is mostly only used when speed is important, which largely precludes locals anyway. \ FIND-NODE and FIND-PRIOR can't be written like this because they both involve an early-out. \ EACH allows the toucher (the code that touches the node) to be tested on its own, and also to be used on its own. \ EACH also allows the toucher to use local variables. \ Both of these advantages can be achieved with EACH[...]EACH by factoring the toucher into a function and calling it. \ EACH can easily be rewritten in assembly language, which might not be true of EACH[...]EACH. \ Because of these reasons, EACH should generally be used rather than EACH[...]EACH. SwiftForth? [if] icode each ( i*x head 'toucher -- j*x ) \ toucher: i*x node -- j*x ebx push [drop] \ -- node \r: -- 'toucher begin non-zero? while 0 [ebx] push \ -- node \r: -- 'toucher next w [esp] eax mov edi eax add eax call \ -- [dup] ebx pop \ -- next \r: -- 'toucher repeat [drop] [rdrop] ret end-code [else] : each ( i*x head 'toucher -- j*x ) \ toucher: i*x node -- j*x >r begin dup while \ -- node \r: -- 'toucher r@ over .fore @ >r \ -- node 'toucher \r: -- 'toucher next execute r> repeat drop rdrop ; [then] \ It is possible for the toucher to access data underneath the node, \ as EACH has no internal data on the stack (that is why we put everything on the return stack during the EXECUTE). \ It is ok for the toucher to REMOVE the node from this list, \ as we already have our next node on the r-stack. \ When writing a filter, you REMOVE all of the nodes that you don't want to keep. \ The toucher shouldn't remove multiple nodes however, \ as it might clobber the next-node that EACH has waiting on the r-stack. SwiftForth? [if] code find-node ( i*x head 'toucher -- j*x node|false ) \ toucher: i*x node -- j*x flag ebx push [drop] \ -- node \r: -- 'toucher begin non-zero? while 0 [ebx] push ebx push \ -- node \r: -- 'toucher next node w 2* [esp] eax mov edi eax add eax call \ -- non-zero? if ebx pop [2rdrop] ret then [rdrop] ebx pop repeat [rdrop] ret end-code [else] : find-node ( i*x head 'toucher -- j*x node|false ) \ toucher: i*x node -- j*x flag >r begin dup while \ -- node \r: -- toucher r@ over .fore @ >r over >r \ -- node toucher \r: -- toucher next node execute if r> 2rdrop exit then \ return node rdrop r> repeat \ -- next \r: -- toucher rdrop ; \ return the node, which is NIL (FALSE) by now [then] \ FIND-NODE is like EACH except that it stops when the toucher returns true, \ and it either returns the found node or false SwiftForth? [if] code find-prior ( i*x head 'toucher -- j*x -1|node|false ) \ toucher: i*x node -- j*x flag ebx push [drop] \ -- node \ r: -- 'toucher -1 # push \ r: -- 'toucher prior \ prior is -1 begin non-zero? while 0 [ebx] push ebx push \ -- node \ r: -- 'toucher prior next node w 3 * [esp] eax mov edi eax add eax call \ -- flag non-zero? if [2rdrop] ebx pop [rdrop] ret then eax pop ebx pop [rdrop] eax push repeat [2rdrop] ret end-code [else] : find-prior ( i*x head 'toucher -- j*x -1|node|false ) \ toucher: i*x node -- j*x flag >r -1 >r \ prior is -1, meaning found node was the head begin dup while \ -- node \r: -- 'toucher prior rr@ over .fore @ >r over >r \ -- node toucher \r: -- 'toucher prior next node execute if 2rdrop r> rdrop exit then \ return prior 2r> rdrop >r repeat \ -- next \r: -- 'toucher node 2rdrop ; \ return the node, which is NIL (FALSE) by now [then] \ FIND-PRIOR is like FIND-NODE except that it returns the node prior to the found node, \ or -1 if the found node was the head --- or false if no node was found. char & comment \ these are EACH, FIND-NODE and FIND-PRIOR with local variables; more readable, but slower : each ( i*x head 'toucher -- j*x ) \ toucher: i*x node -- j*x { toucher | next -- } begin ?dup while \ -- node dup .fore @ to next toucher execute \ -- next repeat ; : find-node ( i*x head 'toucher -- j*x node|false ) \ toucher: i*x node -- j*x flag { node toucher | next -- node|false } begin node while node .fore @ to next node toucher execute if node exit then next to node repeat false ; : find-prior ( i*x head 'toucher -- j*x -1|node|false ) \ toucher: i*x node -- j*x flag -1 { node toucher prior | next -- prior|false } \ prior is -1, meaning found node was the head begin node while node .fore @ to next node toucher execute if prior exit then node to prior next to node repeat false ; & : <find-prev> ( targ node -- targ ? ) .fore @ over = ; : find-prev ( head node -- prev ) \ finds the node prior to node; nil if node = head dup 0= if nip exit then [ safe? ] [if] 2dup in-list? 0= abort" *** find-prev nodes in separate lists ***" [then] swap ['] <find-prev> find-node nip ; SwiftForth? [if] icode length ( head -- count ) zero(eax) begin non-zero? while [@] eax inc repeat eax ebx mov ret end-code [else] : length ( head -- count ) 0 0 do dup 0= if I nip unloop exit then .fore @ loop ; [then] char & comment : tail ( head -- node ) begin dup .fore @ dup while \ -- node next nip repeat drop ; \ This version of TAIL doesn't optimize very well because some code is in front of the WHILE and some after the WHILE. & SwiftForth? [if] icode tail ( head -- node ) zero(eax) \ eax is the previous node (initialized to NIL) begin non-zero? while ebx eax mov [@] repeat eax ebx mov ret end-code [else] : tail ( head -- node ) nil swap begin dup while \ -- prev node nip dup .fore @ repeat drop ; \ This version of TAIL should optimize fairly well because the NIP and the DUP are adjacent. \ SwiftForth doesn't do any peephole optimization, but a more professionally written Forth should be able to combine these. [then] : big-nth ( head n -- node ) \ N is an index as for an array \ returns nil if N is >= to the length 0 ?do .fore @ dup 0= if unloop exit then loop ; : small-nth ( head n -- node ) \ N is an index as for an array \ returns nil if N is >= to the length begin dup while over 0= if drop exit then swap .fore @ swap 1- repeat drop ; \ BIG-NTH is faster with large values of N; there is more up-front overhead but less work per node. \ SMALL-NTH is faster with small values of N; there is less up-front overhead but more work per node. SwiftForth? [if] icode nth ( head n -- node ) ebx eax mov [drop] begin eax eax or 0<> while zero? if ret then [@] eax dec repeat ret end-code [else] macro: nth ( head n -- node ) \ N is an index as for an array \ returns nil if N is >= to the length big-nth ; [then] macro: next ( node -- next ) \ this can do an EXIT, so be careful with it .fore @ dup 0= if exit then ; : 1st ( head -- node ) ; immediate \ this one is included for consistency : 2nd ( head -- node ) [ s" next " 0 ] times .fore @ ; : 3rd ( head -- node ) [ s" next " 1 ] times .fore @ ; : 4th ( head -- node ) [ s" next " 2 ] times .fore @ ; : 5th ( head -- node ) [ s" next " 3 ] times .fore @ ; : 6th ( head -- node ) [ s" next " 4 ] times .fore @ ; : 7th ( head -- node ) [ s" next " 5 ] times .fore @ ; : 8th ( head -- node ) [ s" next " 6 ] times .fore @ ; : 9th ( head -- node ) [ s" next " 7 ] times .fore @ ; \ 1ST, 2ND, etc. do not remove the node. Use REMOVE for that. \ It is also possible to use CLONE-NODE to get a copy of this node without removing the original. \ It is a good idea to use 1ST even though it is just a no-op. \ The algorithm may be changed at some time in the future, in which case 1ST will become necessary. \ For example, rather than have a pointer to the head of the list, the pointer can be to the tail node. \ The tail node's .FORE link would be to the head node. This allows TAIL to be fast, as it doesn't have to traverse the list. \ The tail node's .FORE link would be changed to NIL during any traversal of the list, and then changed back again afterward. \ ****** \ ****** The following are for linking and delinking lists. \ ****** : link ( 1stHead 2ndHead -- NewHead ) \ makes one list over 0= if nip exit then dup 0= if drop exit then [ safe? ] [if] 2dup in-list? abort" *** LINK nodes in the same list ***" [then] over tail .fore ! ; : delink ( 1stHead 2ndHead -- 1stHead 2ndHead ) \ makes two lists dup 0= if exit then 2dup = abort" *** DELINK can't delink the 1st node ***" [ safe? ] [if] 2dup in-list? not abort" *** DELINK nodes not in the same list ***" [then] 2dup find-prev \ -- 1stHead 2ndHead 2ndPrev nil swap .fore ! ; \ If link is given nodes that are in the same list... \ Or FIND-PREV or DELINK are given nodes that are in different lists... Chaos will result! : <delink> ( last -- rest ) \ last will become the tail of the list that its in dup 0= if exit then dup .fore @ \ -- last rest nil rot .fore ! ; : delink-one ( head -- head rest ) dup 1st <delink> ; : delink-two ( head -- head rest ) dup 2nd <delink> ; : delink-three ( head -- head rest ) dup 3rd <delink> ; : delink-four ( head -- head rest ) dup 4th <delink> ; : delink-five ( head -- head rest ) dup 5th <delink> ; : delink-six ( head -- head rest ) dup 6th <delink> ; : delink-seven ( head -- head rest ) dup 7th <delink> ; : delink-eight ( head -- head rest ) dup 8th <delink> ; : delink-nine ( head -- head rest ) dup 9th <delink> ; \ <DELINK> is slightly faster than DELINK because it doesn't need to do a FIND-PREV. \ This is why DELINK-ONE uses <DELINK> rather than: DUP 2ND DELINK \ The same is true of DELINK-TWO etc.. \ ****** \ ****** The following are for inserting or cloning nodes. \ ****** : insert ( 1stHead 2ndMiddle -- ) \ insert 1st list into 2nd list after 2ndMiddle node dup 0= if 2drop exit then over 0= if 2drop exit then [ safe? ] [if] 2dup in-list? abort" *** insert nodes in the same list ***" [then] over tail over .fore @ swap .fore ! \ link node after 2ndMiddle to 1stHead's tail .fore ! ; \ link 1stHead to 2ndMiddle : clone-node ( node -- new-node ) \ returns a list with only one node in it \ puts it in the heap dup allocation >r r@ alloc tuck \ -- new-node node new-node \r: -- size r> cmove> init-list ; : clone-list ( head -- new-head ) nil swap each[ clone-node link ]each ; : concrete-node ( node -- new-node ) \ returns a list with only one node in it \ puts it in the dictionary dup allocation >r r@ concrete-alloc tuck \ -- new-node node new-node r> cmove> init-list ; : concrete-list ( head -- new-head ) nil swap each[ concrete-node link ]each ; \ ****** \ ****** The following are for removing nodes. \ ****** macro: wrappers ( head node -- head node prev next ) \ finds previous and next nodes of given node 2dup find-prev over .fore @ ; \ either may be NIL if node was head or tail of list macro: link-wrappers ( prev next -- ) \ link the next node to the prev node \ assumes PREV is not NIL swap .fore ! ; : remove ( head node -- new-head node ) \ returns new list without node and makes node a sovereign list over 0= abort" *** REMOVE can't remove a node from an empty list ***" dup 0= abort" *** REMOVE can't remove a NIL from a list ***" wrappers \ -- head node prev next over 0= if \ node was the head nip rot drop \ -- node next swap init-list exit then link-wrappers \ -- head node init-list ; \ ****** \ ****** The following are for sorting lists. \ ****** : reverse ( head -- new-head ) \ reverses the order of all the nodes nil swap each[ init-list swap link ]each ; \ comparer: i*x new-node node -- j*x new-node ? \ insert new-node prior to node? : insert-ordered ( head 'comparer node -- new-head 'comparer ) init-list rover rover \ -- head 'comparer node head 'comparer find-prior \ -- head 'comparer node -1|prior|false dup 0= if drop rot swap link swap exit then \ append dup -1 = if drop rot link swap exit then \ prepend insert ; : sort-list ( head 'comparer -- new-head ) nil swap rot each[ insert-ordered ]each drop ; \ If the comparer provides a TRUE on node > new-node, then the list will be sorted in ascending order. \ The following is a typical comparer function: \ : int> ( new-node node -- new-node ) \ assumes that we are sorting on a W field called .N \ .n @ over .n @ u> ; \ ****** \ ****** This a conversion between lists and arrays. \ ****** : list>array ( head -- array elements ) \ the array is of pointers to the nodes \ it is allocated on the heap dup length >r r@ cells alloc tuck >r \ -- array head \r: -- length array each[ over ! cell+ ]each drop r> r> ; : array>list ( array elements -- head ) \ array is assumed to be on the heap \ it is deallocated 1- cells over + \ -- array last nil over @ .fore ! \ set last node's .FORE to NIL over ?do \ -- array \ set the other nodes' .FORE to the next node I cell+ @ I @ .fore ! w +loop dup @ \ -- array head swap dealloc ; macro: array-nth ( array n -- node ) cells + @ ; \ Our NTH function is slow. If this needs to be done a lot, then it may be faster to convert the list into an array \ and use ARRAY-NTH instead. With an array, we no longer have the ability to remove or add nodes. \ Our SORT-LIST function is slow because it is an insertion sort. With a large array, sorting may be faster if the list \ is converted into an array and SORT is used. \ ****** \ ****** These are lists of strings. \ ****** list w field .line constant seq : init-seq ( str node -- node ) init-list >r hstr r@ .line ! r> ; : new-seq ( str -- node ) seq alloc init-seq ; : <kill-seq> ( node -- ) dup .line @ dealloc dealloc ; : kill-seq ( head -- ) each[ <kill-seq> ]each ; : <clone-seq> ( node -- new-node ) clone-node dup .line @ hstr over .line ! ; : clone-seq ( head -- new-head ) nil swap each[ <clone-seq> link ]each ; : concrete-str ( str -- new-str ) here swap count ,str ; : <concrete-seq> ( node -- new-node ) concrete-node dup .line @ concrete-str over .line ! ; : concrete-seq ( head -- new-head ) nil swap each[ <concrete-seq> link ]each ; : <read-seq> ( adr cnt -- head ) r/o <open-file> >r \ r: -- file-id <cstr nil begin \ -- head cstr 1+ cstr-size 1- r@ read-line abort" *** READ-SEQ failed to read line ***" while \ -- head cnt cstr c! \ -- head cstr new-seq \ -- head node link repeat drop \ -- head r> <close-file> cstr> drop ; : read-seq ( name -- head ) count <read-seq> ; : <write-seq> ( head adr cnt -- ) w/o <create-file> swap \ -- file-id head each[ .line @ count rover write-line abort" *** <WRITE-SEQ> failed to write ***" ]each <close-file> ; : write-seq ( head name -- ) count <write-seq> ; : <sort-seq> ( new-node node -- new-node ? ) .line @ count rover .line @ count compare A>B = ; : sort-seq ( head -- new-head ) ['] <sort-seq> sort-list ; : <big-sort-seq> ( adrX adrY -- X>Y? ) >r @ .line @ count r> @ .line @ count compare A>B = ; : big-sort-seq ( head -- new-head ) \ more efficient for large arrays list>array \ -- array elements 2dup w ['] <big-sort-seq> sort array>list ; : <show-seq> ( node -- ) .line @ count type cr ; : show-seq ( head -- ) cr ['] <show-seq> each ; \ SHOW-SEQ uses EACH rather than EACH[...]EACH because <SHOW-SEQ> is useful in its own right. : <split> { adr cnt delimiter left right | lit-str? -- head } nil <cstr adr cnt + adr ?do \ -- head lit-str? if \ if inside of a literal string I c@ right = if false to lit-str? \ trip flag when we pass the right bracket else I c@ char+cstr then \ pass all chars through except the right bracket else I c@ delimiter = if \ if we hit the delimiter outside of a literal string cstr> new-seq link \ -- new-head \ conclude substring and link it <cstr \ start a new substring else I c@ left = if true to lit-str? \ trip flag when we pass the left bracket else I c@ char+cstr then \ pass all chars through except the left bracket then then loop cstr> dup c@ if new-seq link \ if last substring has any chars in it, link it in else drop then ; \ <SPLIT> splits the string on the delimiter into substrings dumped into a SEQ. The delimiter(s) are removed. \ Delimiters inside of left/right brackets are passed through. The brackets are not passed through. \ Use 0 for LEFT and RIGHT if you don't want to have literal strings. vector split ( adr cnt -- head ) : comma-split ( adr cnt -- head ) [char] , [char] " [char] " <split> ; ' comma-split is-split \ The most common format that you get from database dumps is comma delimited with quote-mark brackets. \ If you have something else though, you can use IS-SPLIT to set what you need. : combine ( head delimiter -- str ) \ combines seq strings into a single delimited string swap <cstr each[ .line @ +cstr dup char+cstr ]each drop 1 -cstr \ get rid of the delimiter appended on the last substring cstr> ; \ Be careful in regard to the fact that <SPLIT> removes the brackets from literal strings. \ If you use COMBINE to put the string back together, those brackets will no longer be there. \ If you <SPLIT> the string again, it will split on any delimiters that were inside of literal strings. \ As a rule, I use a different delimiter with COMBINE than I had in split. For example, | rather than the comma. \ The comma is a bad choice considering how common the comma is in data, but the | is rare in data. : remove-padding ( head -- ) \ removes leading and trailing blank characters from strings in seq each[ >r r@ .line @ \ -- str dup count -leading -trailing \ -- str adr cnt <hstr> r@ .line ! \ -- str dealloc \ -- rdrop ]each ; \ ****** \ ****** These are lists of strings that have been split into lists of strings. \ ****** seq w field .head \ a SEQ of the .LINE string split on comma deliminators constant decomma : init-decomma ( str node -- node ) init-seq >r r@ .line @ count split r@ .head ! r> ; : new-decomma ( str -- node ) decomma alloc init-decomma ; : <kill-decomma> ( node -- ) dup .head @ kill-seq <kill-seq> ; : kill-decomma ( head -- ) each[ <kill-decomma> ]each ; : <clone-decomma> ( head node -- new-head ) <clone-seq> dup .head @ clone-seq over .head ! ; : clone-decomma ( head -- new-head ) nil swap each[ <clone-decomma> link ]each ; : <show-decomma> ( node -- ) dup <show-seq> .head @ show-seq cr ; : show-decomma ( head -- ) cr ['] <show-decomma> each ; : seq>decomma ( head -- new-head ) nil swap each[ >r r@ .line @ new-decomma link r@ <kill-seq> rdrop ]each ; : decomma>seq ( head -- new-head ) nil swap each[ >r r@ .head @ [char] | combine \ -- new-head str \ use the | delimiter rather than the comma new-seq link \ -- new-head r@ <kill-decomma> rdrop ]each ; \ ****** \ ****** These are some demonstration functions for DECOMMA lists. \ ****** : <count-C> ( count node -- new-count ) \ increment count if the first field contains a capital C .head @ .line @ count s" C" search nip nip if 1+ then ; : count-C ( head -- count ) 0 swap ['] <count-C> each ; : <purge-C> ( head node -- new-head ) \ remove any node in which the first field contains a capital C dup .head @ .line @ count s" C" search nip nip if remove <kill-decomma> else drop then ; : purge-C ( head -- new-head ) dup ['] <purge-C> each ; : <collect-C> ( nil nodes... head node -- nil nodes... new-head ) \ remove and collect the matching node dup .head @ .line @ count s" C" search nip nip if remove swap else drop then ; : collect-C ( head -- nil nodes... new-head ) nil swap dup ['] <collect-C> each ; : <first-C> ( node -- ? ) .head @ .line @ count s" C" search nip nip ; : first-C ( head -- node|false ) ['] <first-C> find-node ; : test ( -- head ) c" test.seq" read-seq ; \ ****** \ ****** This are queues (FIFO) and stacks (LIFO). \ ****** macro: enqueue ( node head -- new-head ) swap link ; macro: dequeue ( head -- node new-head ) delink-one ; macro: push ( node head -- new-head ) link ; macro: pull ( head -- node new-head ) delink-one ; \ These are arranged so that DEQUEUE and PULL both use DELINK-ONE, which is faster than delinking the tail node.