=========================== SFTP, VoIP, serialization =========================== This month's personal news digest is largely a continuation of the last month's. SFTP, curl, libssh2, Haskell ============================ The quest for working file download over SFTP keeps going on: the custom globbing worked, and I added caching for directory listings, since it seemed simpler than a dedicated thread (especially given that multiple and dynamic remote servers may be used in the future). But then noticed memory leakage. Spent some time poking GHC RTS options, simplifying cache invalidation (making it simply time-based, rather than renewing on the second hit for the same task), employing deepseq and strict-concurrency packages (with the latter leading to high CPU load and delays; the former doing that as well, if it is used where it is not really needed), to use threaded runtime, to call cleanup of the curl handles manually, and profiling (only found that the memory seems to be mostly in ByteString chunks, though if the leakage is in bindings, in curl itself, or the libraries it uses, it would not have shown up in GHC's profiling); nothing helped. Then rewrote it to use libssh2 bindings instead, at least to make it clear whether the issue is in curl (bindings) or somewhere else, and it kept leaking. Adjusted it to use remote globbing over SSH instead of plain directory listing and caching, it still leaked. Wrote a small standalone test program, to remove all the rest, and eventually even skipped file downloads or directory listing; it kept leaking. Today discovered that there is a memory leak in libssh2 bindings (particularly in (de)initialization functions), fixed it, submitted a PR, hopefully it will be merged soon. So apparently it is simply a coincidence that both curl bindings (or the library itself) and libssh2 bindings leak memory. Speaking of PRs and coincidences, I have also submitted a typo fix for the curl bindings, but apparently just as SSHFS, that project is not actively maintained. Quite a mess, but there is a hope that it will not leak anymore with the today's fix. Apparently there is still some time to write a futuristic sci-fi novel about a world in which file transfer is a solved problem: maybe not just with rather involved setups like this, but even between casual computer users. Serialization ============= The memory leakage made me to think of implementing it in C instead, especially when it was not clear whether the issue is in the GC, the laziness, the bindings, or something else: all the moving parts do not help with debugging, while with C leaks are generally easier to search for with Valgrind, and the libraries are more polished than the bindings are. I use a basic language-agnostic IPC for those programs, Unix domain sockets and JSON, but the JSON structures are simply automatically derived out of Haskell types: planned to specify them, but that is additional work (and code), and I am not entirely happy with JSON anyway. Sometimes thinking of switching to XML, though that has its own awkwardness. S-expressions usually look nice to me, but there is no standardized version disconnected from lisps. So I decided to take a stab at specifying a serialization format, in addition to ranting about those regularly: probably it will not lead to anything useful, but at least going to try. So far considered what would be a nice and simple structure at <https://thunix.net/~defanor/notes/serialisation-formats.xhtml>, have put together the grammar, and implemented a few parsers for it (C with flex and Bison, plain C, Haskell with attoparsec, Python with pyparsing) at <https://codeberg.org/defanor/word-tree>. Basically it is like S-expressions without quoted strings (so more like XML in that), representing a tree of strings. Parenthesized to shape the tree, whitespace-delimited, without primitive types: delimiter = " " | "\n" restricted-char = "(" | ")" | delimiter | "\" tree-or-val = "(" forest ")" | delimiter + | ("\" restricted-char | any-char - restricted-char) + forest = tree-or-val * Now thinking about a schema for it. Initially thought of focusing on lexical analysis, since it has no types anyway. Could use regexps for literals, or maybe (A)BNF to handle context-free grammars, which would be nice. But if it will be (A)BNF, it can be used to describe the overall structure (the shape of the tree) through it as well, and just to use (A)BNF instead of a schema. Voice conferences with Mumble ============================= Finally tried seemingly working and easy to setup voice conferencing software, Mumble (and Mumla on Android). It uses a custom container format with encryption (where Ogg or (S)RTP could have been used, I think, although its custom format seems to fit it better), TLS without PKIX or TLSA verification, and not using PSK, either, but required, and generally seems quite hacky, pragmatic, and relatively simple. I heard of it before, but did not try, since it is not likely to be usable in the settings when voice conferences are needed, such as work: people are too reluctant to install special client software, and probably somebody will have MS Windows or an Apple system, where software cannot be installed or does not work properly. So chances are I will not be able even to try it for real conferencing with others, but nice to know that it is available. Not a bad example of applying the "worse is better" approach. Voice control ============= Some time ago I wondered about trying voice control to execute a few predefined commands. Tried out CMU Sphinx back then, which seemed to barely work, and DeepSpeech, which worked much better, though was awkward to setup, and being a new project, its future was less clear. Now looked into it again: DeepSpeech appears to be abandoned (as is Mycroft, by the way), but now there is Whisper, which also works well, though now it is new and relatively awkward to setup. Tried CMU Sphinx with "adaptation" this time, and with a defined grammar restricted to about 40 words: thought that maybe if it cannot handle arbitrary words, I can live with dictating them using a phonetic alphabet and digit names instead. But that barely worked: Sphinx recognizes 6 to 9 words out of about 40, while Whisper recognizes almost all of them, even without any "adaptations" or grammar restriction. Though for a single speaker and such a restricted list of words, I wonder whether some simpler software may work well. I think rather old software was capable of that decades ago. Miscellany ========== - Still going through the physics textbook, still slowly and not skipping exercises, only reached chapter 5 (while the first 14 chapters are on mechanics). But rather happy that I did not drop it by now. Far from active usage of fun SymPy and LaTeX bits, as I did with the electrostatics book, but doing some diagram drawing with Inkscape, plotting with matplotlib, spreadsheets with org-mode's tables; tools I rarely use otherwise, but they are nice. - Tried a slightly different cheesecake recipe, I think it is hard to go particularly wrong with them: combinations of ricotta, mascarpone, possibly yogurt or sour cream, eggs, flour, sugar, and flavorings, with cookies and butter for crust, tend to produce something nice. As various vegetables, meat, mushrooms, and spices do in stews, casseroles, soups. Had a store-bought serving of steamed vegetables though, those were not as nice: tough, plastic-flavored, no oil or salt. But as long as they are prepared sensibly, nice ingredients are not easily messed up. Speaking of stews, tried making a beef stew (out of a chuck roll) in a ceramic pot (which may qualify as a Dutch oven), employing an oven, in addition to a stovetop. Uncertain if this is better than a regular pan and a stovetop, but seems like an okay method. - Frozen broccoli with eggs make a nice dish, and I plan to try using more of frozen vegetable bags, to bother less with cutting. Cutting vegetables takes too much time, and it is nice to have those in a freezer, rather than to shop specifically for a planned dish. Not as good as fresh vegetables, perhaps, but beats those store-bought prepared ones, or things like bread with cheese, which I have sometimes as well. - Going to finally try pour-over coffee brewing, ordered a V60 dripper and filters. I probably have too many coffee brewing devices already though. - Started skipping the evening stretching (balancing) routine and meditation (not sure if it does anything, but sitting calmly for a few minutes should not harm, and probably I will try to resume it), though doing all the other physical exercises daily still. Sometimes it feels like there is not enough time for everything, but I also notice that the time it takes to do the same parts of the routine can vary considerably, depending on whether I hurry and do things as quickly as manageable, or procrastinating in front of the computer after every small bit. - Learned that turkey thighs work well for a soup, even though I thought I dislike darker chicken or turkey meat: rather flavorful, tender, and unlike chicken thighs, not much of connective tissue and other unpleasant bits around them. ---- :Date: 2024-05-22