README - sfeed - RSS and Atom parser
git clone git://git.codemadness.org/sfeed
Log
Files
Refs
README
LICENSE
---
README (34641B)
---
     1 sfeed
     2 -----
     3 
     4 RSS and Atom parser (and some format programs).
     5 
     6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are
     7 formatting programs included to convert this TAB-separated format to various
     8 other formats. There are also some programs and scripts included to import and
     9 export OPML and to fetch, filter, merge and order feed items.
    10 
    11 
    12 Build and install
    13 -----------------
    14 
    15 $ make
    16 # make install
    17 
    18 
    19 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
    20 
    21 $ make SFEED_CURSES=""
    22 # make SFEED_CURSES="" install
    23 
    24 
    25 To change the theme for sfeed_curses you can set SFEED_THEME.  See the themes/
    26 directory for the theme names.
    27 
    28 $ make SFEED_THEME="templeos"
    29 # make SFEED_THEME="templeos" install
    30 
    31 
    32 Usage
    33 -----
    34 
    35 Initial setup:
    36 
    37         mkdir -p "$HOME/.sfeed/feeds"
    38         cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
    39 
    40 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
    41 is included and evaluated as a shellscript for sfeed_update, so its functions
    42 and behaviour can be overridden:
    43 
    44         $EDITOR "$HOME/.sfeed/sfeedrc"
    45 
    46 or you can import existing OPML subscriptions using sfeed_opml_import(1):
    47 
    48         sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
    49 
    50 an example to export from an other RSS/Atom reader called newsboat and import
    51 for sfeed_update:
    52 
    53         newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
    54 
    55 an example to export from an other RSS/Atom reader called rss2email (3.x+) and
    56 import for sfeed_update:
    57 
    58         r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
    59 
    60 Update feeds, this script merges the new items, see sfeed_update(1) for more
    61 information what it can do:
    62 
    63         sfeed_update
    64 
    65 Format feeds:
    66 
    67 Plain-text list:
    68 
    69         sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
    70 
    71 HTML view (no frames), copy style.css for a default style:
    72 
    73         cp style.css "$HOME/.sfeed/style.css"
    74         sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
    75 
    76 HTML view with the menu as frames, copy style.css for a default style:
    77 
    78         mkdir -p "$HOME/.sfeed/frames"
    79         cp style.css "$HOME/.sfeed/frames/style.css"
    80         cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
    81 
    82 To automatically update your feeds periodically and format them in a way you
    83 like you can make a wrapper script and add it as a cronjob.
    84 
    85 Most protocols are supported because curl(1) is used by default and also proxy
    86 settings from the environment (such as the $http_proxy environment variable)
    87 are used.
    88 
    89 The sfeed(1) program itself is just a parser that parses XML data from stdin
    90 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
    91 Gopher, SSH, etc.
    92 
    93 See the section "Usage and examples" below and the man-pages for more
    94 information how to use sfeed(1) and the additional tools.
    95 
    96 
    97 Dependencies
    98 ------------
    99 
   100 - C compiler (C99).
   101 - libc (recommended: C99 and POSIX >= 200809).
   102 
   103 
   104 Optional dependencies
   105 ---------------------
   106 
   107 - POSIX make(1) for the Makefile.
   108 - POSIX sh(1),
   109   used by sfeed_update(1) and sfeed_opml_export(1).
   110 - POSIX utilities such as awk(1) and sort(1),
   111   used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
   112   sfeed_update(1).
   113 - curl(1) binary: https://curl.haxx.se/ ,
   114   used by sfeed_update(1), but can be replaced with any tool like wget(1),
   115   OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
   116 - iconv(1) command-line utilities,
   117   used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
   118   encoded then you don't need this. For a minimal iconv implementation:
   119   https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
   120 - xargs with support for the -P and -0 option,
   121   used by sfeed_update(1).
   122 - mandoc for documentation: https://mdocml.bsd.lv/
   123 - curses (typically ncurses), otherwise see minicurses.h,
   124   used by sfeed_curses(1).
   125 - a terminal (emulator) supporting UTF-8 and the used capabilities,
   126   used by sfeed_curses(1).
   127 
   128 
   129 Optional run-time dependencies for sfeed_curses
   130 -----------------------------------------------
   131 
   132 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
   133 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
   134 - awk, used by the sfeed_content and sfeed_markread script.
   135   See the ENVIRONMENT VARIABLES section in the man page to change it.
   136 - lynx, used by the sfeed_content script to convert HTML content.
   137   See the ENVIRONMENT VARIABLES section in the man page to change it.
   138 
   139 
   140 Formats supported
   141 -----------------
   142 
   143 sfeed supports a subset of XML 1.0 and a subset of:
   144 
   145 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
   146 - Atom 0.3 (draft, historic).
   147 - RSS 0.90+.
   148 - RDF (when used with RSS).
   149 - MediaRSS extensions (media:).
   150 - Dublin Core extensions (dc:).
   151 
   152 Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
   153 supported by converting them to RSS/Atom or to the sfeed(5) format directly.
   154 
   155 
   156 OS tested
   157 ---------
   158 
   159 - Linux,
   160   compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
   161   libc: glibc, musl.
   162 - OpenBSD (clang, gcc).
   163 - NetBSD (with NetBSD curses).
   164 - FreeBSD
   165 - DragonFlyBSD
   166 - GNU/Hurd
   167 - Illumos (OpenIndiana).
   168 - Windows (cygwin gcc + mintty, mingw).
   169 - HaikuOS
   170 - SerenityOS
   171 - FreeDOS (djgpp, Open Watcom).
   172 - FUZIX (sdcc -mz80, with the sfeed parser program).
   173 
   174 
   175 Architectures tested
   176 --------------------
   177 
   178 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
   179 
   180 
   181 Files
   182 -----
   183 
   184 sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
   185                     in TAB-separated format to stdout.
   186 sfeed_atom        - Format feed data (TSV) to an Atom feed.
   187 sfeed_content     - View item content, for use with sfeed_curses.
   188 sfeed_curses      - Format feed data (TSV) to a curses interface.
   189 sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
   190 sfeed_gopher      - Format feed data (TSV) to Gopher files.
   191 sfeed_html        - Format feed data (TSV) to HTML.
   192 sfeed_json        - Format feed data (TSV) to JSON Feed.
   193 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
   194 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
   195 sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
   196 sfeed_mbox        - Format feed data (TSV) to mbox.
   197 sfeed_plain       - Format feed data (TSV) to a plain-text list.
   198 sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
   199 sfeed_update      - Update feeds and merge items.
   200 sfeed_web         - Find URLs to RSS/Atom feed from a webpage.
   201 sfeed_xmlenc      - Detect character-set encoding from a XML stream.
   202 sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
   203 style.css         - Example stylesheet to use with sfeed_html(1) and
   204                     sfeed_frames(1).
   205 
   206 
   207 Files read at runtime by sfeed_update(1)
   208 ----------------------------------------
   209 
   210 sfeedrc - Config file. This file is evaluated as a shellscript in
   211           sfeed_update(1).
   212 
   213 At least the following functions can be overridden per feed:
   214 
   215 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
   216 - filter: to filter on fields.
   217 - merge: to change the merge logic.
   218 - order: to change the sort order.
   219 
   220 See also the sfeedrc(5) man page documentation for more details.
   221 
   222 The feeds() function is called to process the feeds. The default feed()
   223 function is executed concurrently as a background job in your sfeedrc(5) config
   224 file to make updating faster. The variable maxjobs can be changed to limit or
   225 increase the amount of concurrent jobs (8 by default).
   226 
   227 
   228 Files written at runtime by sfeed_update(1)
   229 -------------------------------------------
   230 
   231 feedname     - TAB-separated format containing all items per feed. The
   232                sfeed_update(1) script merges new items with this file.
   233                The format is documented in sfeed(5).
   234 
   235 
   236 File format
   237 -----------
   238 
   239 man 5 sfeed
   240 man 5 sfeedrc
   241 man 1 sfeed
   242 
   243 
   244 Usage and examples
   245 ------------------
   246 
   247 Find RSS/Atom feed URLs from a webpage:
   248 
   249         url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
   250 
   251 output example:
   252 
   253         https://codemadness.org/atom.xml        application/atom+xml
   254         https://codemadness.org/atom_content.xml        application/atom+xml
   255 
   256 - - -
   257 
   258 Make sure your sfeedrc config file exists, see the sfeedrc.example file. To
   259 update your feeds (configfile argument is optional):
   260 
   261         sfeed_update "configfile"
   262 
   263 Format the feeds files:
   264 
   265         # Plain-text list.
   266         sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
   267         # HTML view (no frames), copy style.css for a default style.
   268         sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
   269         # HTML view with the menu as frames, copy style.css for a default style.
   270         mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
   271 
   272 View formatted output in your browser:
   273 
   274         $BROWSER "$HOME/.sfeed/feeds.html"
   275 
   276 View formatted output in your editor:
   277 
   278         $EDITOR "$HOME/.sfeed/feeds.txt"
   279 
   280 - - -
   281 
   282 View formatted output in a curses interface.  The interface has a look inspired
   283 by the mutt mail client.  It has a sidebar panel for the feeds, a panel with a
   284 listing of the items and a small statusbar for the selected item/URL. Some
   285 functions like searching and scrolling are integrated in the interface itself.
   286 
   287 Just like the other format programs included in sfeed you can run it like this:
   288 
   289         sfeed_curses ~/.sfeed/feeds/*
   290 
   291 ... or by reading from stdin:
   292 
   293         sfeed_curses < ~/.sfeed/feeds/xkcd
   294 
   295 By default sfeed_curses marks the items of the last day as new/bold. This limit
   296 might be overridden by setting the environment variable $SFEED_NEW_AGE to the
   297 desired maximum in seconds. To manage read/unread items in a different way a
   298 plain-text file with a list of the read URLs can be used. To enable this
   299 behaviour the path to this file can be specified by setting the environment
   300 variable $SFEED_URL_FILE to the URL file:
   301 
   302         export SFEED_URL_FILE="$HOME/.sfeed/urls"
   303         [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
   304         sfeed_curses ~/.sfeed/feeds/*
   305 
   306 It then uses the shellscript "sfeed_markread" to process the read and unread
   307 items.
   308 
   309 - - -
   310 
   311 Example script to view feed items in a vertical list/menu in dmenu(1). It opens
   312 the selected URL in the browser set in $BROWSER:
   313 
   314         #!/bin/sh
   315         url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
   316                 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
   317         test -n "${url}" && $BROWSER "${url}"
   318 
   319 dmenu can be found at: https://git.suckless.org/dmenu/
   320 
   321 - - -
   322 
   323 Generate a sfeedrc config file from your exported list of feeds in OPML
   324 format:
   325 
   326         sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
   327 
   328 - - -
   329 
   330 Export an OPML file of your feeds from a sfeedrc config file (configfile
   331 argument is optional):
   332 
   333         sfeed_opml_export configfile > myfeeds.opml
   334 
   335 - - -
   336 
   337 The filter function can be overridden in your sfeedrc file. This allows
   338 filtering items per feed. It can be used to shorten URLs, filter away
   339 advertisements, strip tracking parameters and more.
   340 
   341         # filter fields.
   342         # filter(name, url)
   343         filter() {
   344                 case "$1" in
   345                 "tweakers")
   346                         awk -F '\t' 'BEGIN { OFS = "\t"; }
   347                         # skip ads.
   348                         $2 ~ /^ADV:/ {
   349                                 next;
   350                         }
   351                         # shorten link.
   352                         {
   353                                 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
   354                                         $3 = substr($3, RSTART, RLENGTH);
   355                                 }
   356                                 print $0;
   357                         }';;
   358                 "yt BSDNow")
   359                         # filter only BSD Now from channel.
   360                         awk -F '\t' '$2 ~ / \| BSD Now/';;
   361                 *)
   362                         cat;;
   363                 esac | \
   364                         # replace youtube links with embed links.
   365                         sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
   366 
   367                         awk -F '\t' 'BEGIN { OFS = "\t"; }
   368                         function filterlink(s) {
   369                                 # protocol must start with http, https or gopher.
   370                                 if (match(s, /^(http|https|gopher):\/\//) == 0) {
   371                                         return "";
   372                                 }
   373 
   374                                 # shorten feedburner links.
   375                                 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
   376                                         s = substr($3, RSTART, RLENGTH);
   377                                 }
   378 
   379                                 # strip tracking parameters
   380                                 # urchin, facebook, piwik, webtrekk and generic.
   381                                 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
   382                                 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
   383 
   384                                 gsub(/\?&/, "?", s);
   385                                 gsub(/[\?&]+$/, "", s);
   386 
   387                                 return s
   388                         }
   389                         {
   390                                 $3 = filterlink($3); # link
   391                                 $8 = filterlink($8); # enclosure
   392 
   393                                 # try to remove tracking pixels:  tags with 1px width or height.
   394                                 gsub("]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
   395 
   396                                 print $0;
   397                         }'
   398         }
   399 
   400 - - -
   401 
   402 Aggregate feeds. This filters new entries (maximum one day old) and sorts them
   403 by newest first. Prefix the feed name in the title. Convert the TSV output data
   404 to an Atom XML feed (again):
   405 
   406         #!/bin/sh
   407         cd ~/.sfeed/feeds/ || exit 1
   408 
   409         awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
   410         BEGIN {        OFS = "\t"; }
   411         int($1) >= old {
   412                 $2 = "[" FILENAME "] " $2;
   413                 print $0;
   414         }' * | \
   415         sort -k1,1rn | \
   416         sfeed_atom
   417 
   418 - - -
   419 
   420 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
   421 showing them as plain-text per line similar to sfeed_plain(1):
   422 
   423 Create a FIFO:
   424 
   425         fifo="/tmp/sfeed_fifo"
   426         mkfifo "$fifo"
   427 
   428 On the reading side:
   429 
   430         # This keeps track of unique lines so might consume much memory.
   431         # It tries to reopen the $fifo after 1 second if it fails.
   432         while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
   433 
   434 On the writing side:
   435 
   436         feedsdir="$HOME/.sfeed/feeds/"
   437         cd "$feedsdir" || exit 1
   438         test -p "$fifo" || exit 1
   439 
   440         # 1 day is old news, don't write older items.
   441         awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
   442         BEGIN { OFS = "\t"; }
   443         int($1) >= old {
   444                 $2 = "[" FILENAME "] " $2;
   445                 print $0;
   446         }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
   447 
   448 cut -b is used to trim the "N " prefix of sfeed_plain(1).
   449 
   450 - - -
   451 
   452 For some podcast feed the following code can be used to filter the latest
   453 enclosure URL (probably some audio file):
   454 
   455         awk -F '\t' 'BEGIN { latest = 0; }
   456         length($8) {
   457                 ts = int($1);
   458                 if (ts > latest) {
   459                         url = $8;
   460                         latest = ts;
   461                 }
   462         }
   463         END { if (length(url)) { print url; } }'
   464 
   465 ... or on a file already sorted from newest to oldest:
   466 
   467         awk -F '\t' '$8 { print $8; exit }'
   468 
   469 - - -
   470 
   471 Over time your feeds file might become quite big. You can archive items of a
   472 feed from (roughly) the last week by doing for example:
   473 
   474         awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
   475         mv feed feed.bak
   476         mv feed.new feed
   477 
   478 This could also be run weekly in a crontab to archive the feeds. Like throwing
   479 away old newspapers. It keeps the feeds list tidy and the formatted output
   480 small.
   481 
   482 - - -
   483 
   484 Convert mbox to separate maildirs per feed and filter duplicate messages using the
   485 fdm program.
   486 fdm is available at: https://github.com/nicm/fdm
   487 
   488 fdm config file (~/.sfeed/fdm.conf):
   489 
   490         set unmatched-mail keep
   491 
   492         account "sfeed" mbox "%[home]/.sfeed/mbox"
   493                 $cachepath = "%[home]/.sfeed/fdm.cache"
   494                 cache "${cachepath}"
   495                 $maildir = "%[home]/feeds/"
   496 
   497                 # Check if message is in the cache by Message-ID.
   498                 match case "^Message-ID: (.*)" in headers
   499                         action {
   500                                 tag "msgid" value "%1"
   501                         }
   502                         continue
   503 
   504                 # If it is in the cache, stop.
   505                 match matched and in-cache "${cachepath}" key "%[msgid]"
   506                         action {
   507                                 keep
   508                         }
   509 
   510                 # Not in the cache, process it and add to cache.
   511                 match case "^X-Feedname: (.*)" in headers
   512                         action {
   513                                 # Store to local maildir.
   514                                 maildir "${maildir}%1"
   515 
   516                                 add-to-cache "${cachepath}" key "%[msgid]"
   517                                 keep
   518                         }
   519 
   520 Now run:
   521 
   522         $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
   523         $ fdm -f ~/.sfeed/fdm.conf fetch
   524 
   525 Now you can view feeds in mutt(1) for example.
   526 
   527 - - -
   528 
   529 Read from mbox and filter duplicate messages using the fdm program and deliver
   530 it to a SMTP server. This works similar to the rss2email program.
   531 fdm is available at: https://github.com/nicm/fdm
   532 
   533 fdm config file (~/.sfeed/fdm.conf):
   534 
   535         set unmatched-mail keep
   536 
   537         account "sfeed" mbox "%[home]/.sfeed/mbox"
   538                 $cachepath = "%[home]/.sfeed/fdm.cache"
   539                 cache "${cachepath}"
   540 
   541                 # Check if message is in the cache by Message-ID.
   542                 match case "^Message-ID: (.*)" in headers
   543                         action {
   544                                 tag "msgid" value "%1"
   545                         }
   546                         continue
   547 
   548                 # If it is in the cache, stop.
   549                 match matched and in-cache "${cachepath}" key "%[msgid]"
   550                         action {
   551                                 keep
   552                         }
   553 
   554                 # Not in the cache, process it and add to cache.
   555                 match case "^X-Feedname: (.*)" in headers
   556                         action {
   557                                 # Connect to a SMTP server and attempt to deliver the
   558                                 # mail to it.
   559                                 # Of course change the server and e-mail below.
   560                                 smtp server "codemadness.org" to "hiltjo@codemadness.org"
   561 
   562                                 add-to-cache "${cachepath}" key "%[msgid]"
   563                                 keep
   564                         }
   565 
   566 Now run:
   567 
   568         $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
   569         $ fdm -f ~/.sfeed/fdm.conf fetch
   570 
   571 Now you can view feeds in mutt(1) for example.
   572 
   573 - - -
   574 
   575 Convert mbox to separate maildirs per feed and filter duplicate messages using
   576 procmail(1).
   577 
   578 procmail_maildirs.sh file:
   579 
   580         maildir="$HOME/feeds"
   581         feedsdir="$HOME/.sfeed/feeds"
   582         procmailconfig="$HOME/.sfeed/procmailrc"
   583 
   584         # message-id cache to prevent duplicates.
   585         mkdir -p "${maildir}/.cache"
   586 
   587         if ! test -r "${procmailconfig}"; then
   588                 printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
   589                 echo "See procmailrc.example for an example." >&2
   590                 exit 1
   591         fi
   592 
   593         find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
   594                 name=$(basename "${d}")
   595                 mkdir -p "${maildir}/${name}/cur"
   596                 mkdir -p "${maildir}/${name}/new"
   597                 mkdir -p "${maildir}/${name}/tmp"
   598                 printf 'Mailbox %s\n' "${name}"
   599                 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
   600         done
   601 
   602 Procmailrc(5) file:
   603 
   604         # Example for use with sfeed_mbox(1).
   605         # The header X-Feedname is used to split into separate maildirs. It is
   606         # assumed this name is sane.
   607 
   608         MAILDIR="$HOME/feeds/"
   609 
   610         :0
   611         * ^X-Feedname: \/.*
   612         {
   613                 FEED="$MATCH"
   614 
   615                 :0 Wh: "msgid_$FEED.lock"
   616                 | formail -D 1024000 ".cache/msgid_$FEED.cache"
   617 
   618                 :0
   619                 "$FEED"/
   620         }
   621 
   622 Now run:
   623 
   624         $ procmail_maildirs.sh
   625 
   626 Now you can view feeds in mutt(1) for example.
   627 
   628 - - -
   629 
   630 The fetch function can be overridden in your sfeedrc file. This allows to
   631 replace the default curl(1) for sfeed_update with any other client to fetch the
   632 RSS/Atom data or change the default curl options:
   633 
   634         # fetch a feed via HTTP/HTTPS etc.
   635         # fetch(name, url, feedfile)
   636         fetch() {
   637                 hurl -m 1048576 -t 15 "$2" 2>/dev/null
   638         }
   639 
   640 - - -
   641 
   642 Caching, incremental data updates and bandwidth saving
   643 
   644 For servers that support it some incremental updates and bandwidth saving can
   645 be done by using the "ETag" HTTP header.
   646 
   647 Create a directory for storing the ETags per feed:
   648 
   649         mkdir -p ~/.sfeed/etags/
   650 
   651 The curl ETag options (--etag-save and --etag-compare) can be used to store and
   652 send the previous ETag header value. curl version 7.73+ is recommended for it
   653 to work properly.
   654 
   655 The curl -z option can be used to send the modification date of a local file as
   656 a HTTP "If-Modified-Since" request header. The server can then respond if the
   657 data is modified or not or respond with only the incremental data.
   658 
   659 The curl --compressed option can be used to indicate the client supports
   660 decompression. Because RSS/Atom feeds are textual XML content this generally
   661 compresses very well.
   662 
   663 These options can be set by overriding the fetch() function in the sfeedrc
   664 file:
   665 
   666         # fetch(name, url, feedfile)
   667         fetch() {
   668                 etag="$HOME/.sfeed/etags/$(basename "$3")"
   669                 curl \
   670                         -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
   671                         --compressed \
   672                         --etag-save "${etag}" --etag-compare "${etag}" \
   673                         -z "${etag}" \
   674                         "$2" 2>/dev/null
   675         }
   676 
   677 These options can come at a cost of some privacy, because it exposes
   678 additional metadata from the previous request.
   679 
   680 - - -
   681 
   682 CDNs blocking requests due to a missing HTTP User-Agent request header
   683 
   684 sfeed_update will not send the "User-Agent" header by default for privacy
   685 reasons.  Some CDNs like Cloudflare or websites like Reddit.com don't like this
   686 and will block such HTTP requests.
   687 
   688 A custom User-Agent can be set by using the curl -H option, like so:
   689 
   690         curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
   691 
   692 The above example string pretends to be a Windows 10 (x86-64) machine running
   693 Firefox 78.
   694 
   695 - - -
   696 
   697 Page redirects
   698 
   699 For security and efficiency reasons by default redirects are not allowed and
   700 are treated as an error.
   701 
   702 For example to prevent hijacking an unencrypted http:// to https:// redirect or
   703 to not add time of an unnecessary page redirect each time.  It is encouraged to
   704 use the final redirected URL in the sfeedrc config file.
   705 
   706 If you want to ignore this advise you can override the fetch() function in the
   707 sfeedrc file and change the curl options "-L --max-redirs 0".
   708 
   709 - - -
   710 
   711 Shellscript to handle URLs and enclosures in parallel using xargs -P.
   712 
   713 This can be used to download and process URLs for downloading podcasts,
   714 webcomics, download and convert webpages, mirror videos, etc. It uses a
   715 plain-text cache file for remembering processed URLs. The match patterns are
   716 defined in the shellscript fetch() function and in the awk script and can be
   717 modified to handle items differently depending on their context.
   718 
   719 The arguments for the script are files in the sfeed(5) format. If no file
   720 arguments are specified then the data is read from stdin.
   721 
   722         #!/bin/sh
   723         # sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
   724         # Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
   725         
   726         cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
   727         jobs="${SFEED_JOBS:-4}"
   728         lockfile="${HOME}/.sfeed/sfeed_download.lock"
   729         
   730         # log(feedname, s, status)
   731         log() {
   732                 if [ "$1" != "-" ]; then
   733                         s="[$1] $2"
   734                 else
   735                         s="$2"
   736                 fi
   737                 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
   738         }
   739         
   740         # fetch(url, feedname)
   741         fetch() {
   742                 case "$1" in
   743                 *youtube.com*)
   744                         yt-dlp "$1";;
   745                 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
   746                         # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
   747                         curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
   748                 esac
   749         }
   750         
   751         # downloader(url, title, feedname)
   752         downloader() {
   753                 url="$1"
   754                 title="$2"
   755                 feedname="${3##*/}"
   756         
   757                 msg="${title}: ${url}"
   758         
   759                 # download directory.
   760                 if [ "${feedname}" != "-" ]; then
   761                         mkdir -p "${feedname}"
   762                         if ! cd "${feedname}"; then
   763                                 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
   764                                 return 1
   765                         fi
   766                 fi
   767         
   768                 log "${feedname}" "${msg}" "START"
   769                 if fetch "${url}" "${feedname}"; then
   770                         log "${feedname}" "${msg}" "OK"
   771         
   772                         # append it safely in parallel to the cachefile on a
   773                         # successful download.
   774                         (flock 9 || exit 1
   775                         printf '%s\n' "${url}" >> "${cachefile}"
   776                         ) 9>"${lockfile}"
   777                 else
   778                         log "${feedname}" "${msg}" "FAIL" >&2
   779                         return 1
   780                 fi
   781                 return 0
   782         }
   783         
   784         if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
   785                 # Downloader helper for parallel downloading.
   786                 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
   787                 # It should write the URI to the cachefile if it is successful.
   788                 downloader "$1" "$2" "$3"
   789                 exit $?
   790         fi
   791         
   792         # ...else parent mode:
   793         
   794         tmp="$(mktemp)" || exit 1
   795         trap "rm -f ${tmp}" EXIT
   796         
   797         [ -f "${cachefile}" ] || touch "${cachefile}"
   798         cat "${cachefile}" > "${tmp}"
   799         echo >> "${tmp}" # force it to have one line for awk.
   800         
   801         LC_ALL=C awk -F '\t' '
   802         # fast prefilter what to download or not.
   803         function filter(url, field, feedname) {
   804                 u = tolower(url);
   805                 return (match(u, "youtube\\.com") ||
   806                         match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
   807         }
   808         function download(url, field, title, filename) {
   809                 if (!length(url) || urls[url] || !filter(url, field, filename))
   810                         return;
   811                 # NUL-separated for xargs -0.
   812                 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
   813                 urls[url] = 1; # print once
   814         }
   815         {
   816                 FILENR += (FNR == 1);
   817         }
   818         # lookup table from cachefile which contains downloaded URLs.
   819         FILENR == 1 {
   820                 urls[$0] = 1;
   821         }
   822         # feed file(s).
   823         FILENR != 1 {
   824                 download($3, 3, $2, FILENAME); # link
   825                 download($8, 8, $2, FILENAME); # enclosure
   826         }
   827         ' "${tmp}" "${@:--}" | \
   828         SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
   829 
   830 - - -
   831 
   832 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
   833 TSV format.
   834 
   835         #!/bin/sh
   836         # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
   837         # The data is split per file per feed with the name of the newsboat title/url.
   838         # It writes the URLs of the read items line by line to a "urls" file.
   839         #
   840         # Dependencies: sqlite3, awk.
   841         #
   842         # Usage: create some directory to store the feeds then run this script.
   843         
   844         # newsboat cache.db file.
   845         cachefile="$HOME/.newsboat/cache.db"
   846         test -n "$1" && cachefile="$1"
   847         
   848         # dump data.
   849         # .mode ascii: Columns/rows delimited by 0x1F and 0x1E
   850         # get the first fields in the order of the sfeed(5) format.
   851         sqlite3 "$cachefile" < "/dev/stderr";
   902                 }
   903         
   904                 contenttype = field($5);
   905                 if (contenttype == "")
   906                         contenttype = "html";
   907                 else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
   908                         contenttype = "html";
   909                 else
   910                         contenttype = "plain";
   911         
   912                 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
   913                         contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
   914                         > fname;
   915         
   916                 # write URLs of the read items to a file line by line.
   917                 if ($11 == "0") {
   918                         print $3 > "urls";
   919                 }
   920         }'
   921 
   922 - - -
   923 
   924 Progress indicator
   925 ------------------
   926 
   927 The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
   928 config.  It then calls sfeed_update and pipes the output lines to a function
   929 that counts the current progress. It writes the total progress to stderr.
   930 Alternative: pv -l -s totallines
   931 
   932         #!/bin/sh
   933         # Progress indicator script.
   934         
   935         # Pass lines as input to stdin and write progress status to stderr.
   936         # progress(totallines)
   937         progress() {
   938                 total="$(($1 + 0))" # must be a number, no divide by zero.
   939                 test "${total}" -le 0 -o "$1" != "${total}" && return
   940         LC_ALL=C awk -v "total=${total}" '
   941         {
   942                 counter++;
   943                 percent = (counter * 100) / total;
   944                 printf("\033[K") > "/dev/stderr"; # clear EOL
   945                 print $0;
   946                 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
   947                 fflush(); # flush all buffers per line.
   948         }
   949         END {
   950                 printf("\033[K") > "/dev/stderr";
   951         }'
   952         }
   953         
   954         # Counts the feeds from the sfeedrc config.
   955         countfeeds() {
   956                 count=0
   957         . "$1"
   958         feed() {
   959                 count=$((count + 1))
   960         }
   961                 feeds
   962                 echo "${count}"
   963         }
   964         
   965         config="${1:-$HOME/.sfeed/sfeedrc}"
   966         total=$(countfeeds "${config}")
   967         sfeed_update "${config}" 2>&1 | progress "${total}"
   968 
   969 - - -
   970 
   971 Counting unread and total items
   972 -------------------------------
   973 
   974 It can be useful to show the counts of unread items, for example in a
   975 windowmanager or statusbar.
   976 
   977 The below example script counts the items of the last day in the same way the
   978 formatting tools do:
   979 
   980         #!/bin/sh
   981         # Count the new items of the last day.
   982         LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
   983         {
   984                 total++;
   985         }
   986         int($1) >= old {
   987                 totalnew++;
   988         }
   989         END {
   990                 print "New:   " totalnew;
   991                 print "Total: " total;
   992         }' ~/.sfeed/feeds/*
   993 
   994 The below example script counts the unread items using the sfeed_curses URL
   995 file:
   996 
   997         #!/bin/sh
   998         # Count the unread and total items from feeds using the URL file.
   999         LC_ALL=C awk -F '\t' '
  1000         # URL file: amount of fields is 1.
  1001         NF == 1 {
  1002                 u[$0] = 1; # lookup table of URLs.
  1003                 next;
  1004         }
  1005         # feed file: check by URL or id.
  1006         {
  1007                 total++;
  1008                 if (length($3)) {
  1009                         if (u[$3])
  1010                                 read++;
  1011                 } else if (length($6)) {
  1012                         if (u[$6])
  1013                                 read++;
  1014                 }
  1015         }
  1016         END {
  1017                 print "Unread: " (total - read);
  1018                 print "Total:  " total;
  1019         }' ~/.sfeed/urls ~/.sfeed/feeds/*
  1020 
  1021 - - -
  1022 
  1023 sfeed.c: adding new XML tags or sfeed(5) fields to the parser
  1024 -------------------------------------------------------------
  1025 
  1026 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV
  1027 fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a
  1028 number.  This TagId is then mapped to the output field index.
  1029 
  1030 Steps to modify the code:
  1031 
  1032 * Add a new TagId enum for the tag.
  1033 
  1034 * (optional) Add a new FeedField* enum for the new output field or you can map
  1035   it to an existing field.
  1036 
  1037 * Add the new XML tag name to the array variable of parsed RSS or Atom
  1038   tags: rsstags[] or atomtags[].
  1039 
  1040   These must be defined in alphabetical order, because a binary search is used
  1041   which uses the strcasecmp() function.
  1042 
  1043 * Add the parsed TagId to the output field in the array variable fieldmap[].
  1044 
  1045   When another tag is also mapped to the same output field then the tag with
  1046   the highest TagId number value overrides the mapped field: the order is from
  1047   least important to high.
  1048 
  1049 * If this defined tag is just using the inner data of the XML tag, then this
  1050   definition is enough. If it for example has to parse a certain attribute you
  1051   have to add a check for the TagId to the xmlattr() callback function.
  1052 
  1053 * (optional) Print the new field in the printfields() function.
  1054 
  1055 Below is a patch example to add the MRSS "media:content" tag as a new field:
  1056 
  1057 diff --git a/sfeed.c b/sfeed.c
  1058 --- a/sfeed.c
  1059 +++ b/sfeed.c
  1060 @@ -50,7 +50,7 @@ enum TagId {
  1061          RSSTagGuidPermalinkTrue,
  1062          /* must be defined after GUID, because it can be a link (isPermaLink) */
  1063          RSSTagLink,
  1064 -        RSSTagEnclosure,
  1065 +        RSSTagMediaContent, RSSTagEnclosure,
  1066          RSSTagAuthor, RSSTagDccreator,
  1067          RSSTagCategory,
  1068          /* Atom */
  1069 @@ -81,7 +81,7 @@ typedef struct field {
  1070  enum {
  1071          FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent,
  1072          FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory,
  1073 -        FeedFieldLast
  1074 +        FeedFieldMediaContent, FeedFieldLast
  1075  };
  1076  
  1077  typedef struct feedcontext {
  1078 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
  1079          { STRP("enclosure"),         RSSTagEnclosure         },
  1080          { STRP("guid"),              RSSTagGuid              },
  1081          { STRP("link"),              RSSTagLink              },
  1082 +        { STRP("media:content"),     RSSTagMediaContent      },
  1083          { STRP("media:description"), RSSTagMediaDescription  },
  1084          { STRP("pubdate"),           RSSTagPubdate           },
  1085          { STRP("title"),             RSSTagTitle             }
  1086 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
  1087          [RSSTagGuidPermalinkFalse] = FeedFieldId,
  1088          [RSSTagGuidPermalinkTrue]  = FeedFieldId, /* special-case: both a link and an id */
  1089          [RSSTagLink]               = FeedFieldLink,
  1090 +        [RSSTagMediaContent]       = FeedFieldMediaContent,
  1091          [RSSTagEnclosure]          = FeedFieldEnclosure,
  1092          [RSSTagAuthor]             = FeedFieldAuthor,
  1093          [RSSTagDccreator]          = FeedFieldAuthor,
  1094 @@ -677,6 +679,8 @@ printfields(void)
  1095          string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
  1096          putchar(FieldSeparator);
  1097          string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
  1098 +        putchar(FieldSeparator);
  1099 +        string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
  1100          putchar('\n');
  1101  
  1102          if (ferror(stdout)) /* check for errors but do not flush */
  1103 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl,
  1104          }
  1105  
  1106          if (ctx.feedtype == FeedTypeRSS) {
  1107 -                if (ctx.tag.id == RSSTagEnclosure &&
  1108 +                if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) &&
  1109                      isattr(n, nl, STRP("url"))) {
  1110                          string_append(&tmpstr, v, vl);
  1111                  } else if (ctx.tag.id == RSSTagGuid &&
  1112 
  1113 - - -
  1114 
  1115 Running custom commands inside the sfeed_curses program
  1116 -------------------------------------------------------
  1117 
  1118 Running commands inside the sfeed_curses program can be useful for example to
  1119 sync items or mark all items across all feeds as read. It can be comfortable to
  1120 have a keybind for this inside the program to perform a scripted action and
  1121 then reload the feeds by sending the signal SIGHUP.
  1122 
  1123 In the input handling code you can then add a case:
  1124 
  1125         case 'M':
  1126                 forkexec((char *[]) { "markallread.sh", NULL }, 0);
  1127                 break;
  1128 
  1129 or
  1130 
  1131         case 'S':
  1132                 forkexec((char *[]) { "syncnews.sh", NULL }, 1);
  1133                 break;
  1134 
  1135 The specified script should be in $PATH or be an absolute path.
  1136 
  1137 Example of a `markallread.sh` shellscript to mark all URLs as read:
  1138 
  1139         #!/bin/sh
  1140         # mark all items/URLs as read.
  1141         tmp="$(mktemp)" || exit 1
  1142         (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
  1143         awk '!x[$0]++' > "$tmp" &&
  1144         mv "$tmp" ~/.sfeed/urls &&
  1145         pkill -SIGHUP sfeed_curses # reload feeds.
  1146 
  1147 Example of a `syncnews.sh` shellscript to update the feeds and reload them:
  1148 
  1149         #!/bin/sh
  1150         sfeed_update
  1151         pkill -SIGHUP sfeed_curses
  1152 
  1153 
  1154 Running programs in a new session
  1155 ---------------------------------
  1156 
  1157 By default processes are spawned in the same session and process group as
  1158 sfeed_curses.  When sfeed_curses is closed this can also close the spawned
  1159 process in some cases.
  1160 
  1161 When the setsid command-line program is available the following wrapper command
  1162 can be used to run the program in a new session, for a plumb program:
  1163 
  1164         setsid -f xdg-open "$@"
  1165 
  1166 Alternatively the code can be changed to call setsid() before execvp().
  1167 
  1168 
  1169 Open an URL directly in the same terminal
  1170 -----------------------------------------
  1171 
  1172 To open an URL directly in the same terminal using the text-mode lynx browser:
  1173 
  1174         SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
  1175 
  1176 
  1177 Yank to tmux buffer
  1178 -------------------
  1179 
  1180 This changes the yank command to set the tmux buffer, instead of X11 xclip:
  1181 
  1182         SFEED_YANKER="tmux set-buffer \`cat\`"
  1183 
  1184 
  1185 Known terminal issues
  1186 ---------------------
  1187 
  1188 Below lists some bugs or missing features in terminals that are found while
  1189 testing sfeed_curses.  Some of them might be fixed already upstream:
  1190 
  1191 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
  1192   scrolling.
  1193 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
  1194   middle-button, right-button is incorrect / reversed.
  1195 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
  1196   window title.
  1197 - Mouse button encoding for extended buttons (like side-buttons) in some
  1198   terminals are unsupported or map to the same button: for example side-buttons 7
  1199   and 8 map to the scroll buttons 4 and 5 in urxvt.
  1200 
  1201 
  1202 License
  1203 -------
  1204 
  1205 ISC, see LICENSE file.
  1206 
  1207 
  1208 Author
  1209 ------
  1210 
  1211 Hiltjo Posthuma