Offline computing: HTTP browsing After a shift of work in daylight, I'm back again at night shifts. For the first of the three to come, let's talk a little about offline HTTP browsing. It happens frequently that, while I offline browse my content in my mail client (simple emails, newsletters, or RSS feeds), an URL catches my curiosity. Like everyone else in these cases, I keep it aside so that I can visit it when I have access to the Internet. Unfortunately, when the time comes, I do not necessarily have all the time I want to read articles that can be long. Therefore, I needed a way to save these URLs for offline viewing. For quite a long time, and for lack of better, I used the "-dump" option of lynx(1) to save the pages I wanted in plain text format. Exactly, I was using this command: $ lynx -force_html -dump -width=72 -verbose -with_backspaces <URL> I even made a small script to save the pages according to a DOMAIN-TITLE.txt scheme with a header including the original URL, the title, as well as the timestamp of the dump. But obviously, the web being what it is, the results were often messy and I had to use my Emacs kung-fu to clean it up. I wasn't very happy with this solution; I wanted an other one. *** Especially I had heard about a very popular method in the 1990s. At the time, many didn't have direct access to the web; but having an email they were offered the possibility of repatriating pages into their mailbox through a gateway. It seems that some old timers (RMS to cite one) are still actively using this solution; and I was thinking that this is something that would suit me, given the time I already spend in my email client and the modularity it brings me. And then, a few months ago, I had the pleasant surprise to learn that Anirudh 'icyphox' Oppiliappan had just opened a service working on this principle: forlater.email. This is very simple. We send one or more URLs in the body of an email to save@forlater.email. And the server returns the items to us (from saved@forlater.email) in their simplest form; that is to say just the textual content, without the frills (menus and others), somehow following the idea of what does the reader mode of Firefox. I would love to host this service at home, on my own server (the sources are open). But I lack the skills to instinctively know what exactly I should do (most of all about the necessary self-hosted mail server). Above all, I lack the time to look into it seriously. So I am currently using forlater.email with pleasure, because I find it very convenient and it makes my life much easier. Indeed, I just have to keep the links to download when I am offline in my msmtp queue in order to send them the next time I connect (as I've discussed in a previous post). *** But what if tomorrow this service does not exist anymore? Well I will come back to a third solution that I like and that I still use sometimes: a python script written by David Larlet. David Larlet is a French developer who currently lives in Montreal. He has a blog that I particularly like. Aware of the probable disappearance of the articles he frequently cite, he wrote a script that allows him to keep a cache of the articles cited, a cache that he hosts himself and that can be browsed (for instance, this year's cache: https://larlet.fr/david/cache/2022/). If you visit the link, you'll see that the cached articles contain only the essential, namely the body of the text (thanks to the readability python module). And one day it just hit me: I could use his script to host my own offline cache of URLs. He had the good idea to share the sources of his script at https://git.larlet.fr/davidbgk/larlet-fr-david-cache. Many thanks to him. I had to modify it just a little to suit my needs but nothing exceptional: modify the paths, remove one or two functions that do not serve me, and clean up the HTML templates provided that are specific to his website. Once the requirements installed, the use is as easy as a : $ python /path/to/cache.py new <URL> The script will generate a clean archive of the URL content as well as it will update the main index page containing all saved links. And when I want to read those when I'm offline, I just have to point lynx to the index.html to browse all the cached articles I saved. Erratum: to generate the main index, one must run "cache.py gen". I have to say that this third solution is by far my favorite. It's written by someone I like and respect; even if I don't fluently read and write python, I understand most of the code; and I don't rely on any external service. So, yes, I use forlater.email because I like the idea to have my own offline HTTP cache in my mail client. But that being said, and now that I think about it, I could make a small script that would automatically send me by email the result produced by David Larlet's script... Hey, I think I just found a small project that will keep me busy a little during this night shift! *** To conclude, I would like to quickly mention another solution that may be interesting: Offpunk (https://notabug.org/ploum/offpunk). This is a browser designed by Lionel 'Ploum' Dricot to be offline first. As the README explains it in the source repository "the goal of Offpunk is to be able to synchronise your content once (a day, a week, a month) and then browse/organise it while staying disconnected". It is based on the gemini CLI browser AV-98 and handles Gemini, Gopher, Spartan and the Web. I have just tested it a little but it's an interesting solution for me to keep an offline cache for my gopher browsing. Maybe more on that later if I happen to use it for real in that scenario. In the meantime, I wish a "bon courage" to those who work, and good rest to those who have a day off. In anyway, take care of yourselves and your loved ones.