proxy70

	[HN Gopher] Pipe Viewer ___________________________________________________________________ Pipe Viewer Author : 0x45696e6172 Score : 141 points Date : 2022-10-18 09:15 UTC (1 days ago)
	web link (www.ivarch.com)
	w3m dump (www.ivarch.com)
	\| londons_explore wrote: \| It would be nice to indicate if the upstream or the downstream is \| the 'limiting' factor in speed. \| \| Ie. within pv, is it the reading the input stream or the writing \| the output stream that is blocking most of the time? \| ketralnis wrote: \| It's open source, be the change you want to see in the world \| kotlin2 wrote: \| Having maintained an open source library, it's actually \| really helpful to see features people want. Not everyone \| needs to contribute directly to the code base. User feedback \| is valuable, too. \| bingaling wrote: \| it's instantaneous, but the -T (transfer buffer % full display) \| is sometimes useful for that. (0% full -> source limited, 100% \| full -> sink limited) \| Twirrim wrote: \| Oh wow, I'd completely missed that -T flag. That's some \| useful data. Thanks for mentioning it! \| senjin wrote: \| This would be a genius addition \| heinrich5991 wrote: \| `progress` is also a nice tool to see progress of programs \| operating linearly on a single file. A lot of tools do that! \| sigmonsays wrote: \| i've consistently lost and found this tool over and over again \| for over 20 years \| pbhjpbhj wrote: \| Same, `apropos $keyword` helps, but strangely in this case \| doesn't find `progress` from `apropos progress`. \| sneak wrote: \| part of my default install. \| torgard wrote: \| There are countless times where I would have found this \| incredibly helpful. Just 10 minutes ago, I wanted this exact \| tool. \| \| Thanks! \| derefr wrote: \| As a person who runs a lot of ETL-like commands at work, I never \| find myself using pv(1). I love the idea of it, but for the \| commands I most want to measure progress of, they always seem to \| be either: \| \| 1. things where I'd be paranoid about pv(1) itself becoming the \| bottleneck in the pipeline -- e.g. dd(1) of large disks where \| I've explicitly set a large blocksize and set \| conv=idirect/odirect, to optimize throughput. \| \| 2. things where the program has some useful cleverness I rely on \| that requires being fed by a named file argument, but behaves a \| lot less intelligently when being fed from stdin -- e.g. feeding \| SQL files into psql(1). \| \| 3. things where the program, even while writing to stdout, also \| produces useful "sampled progress" informational messages on \| stderr, which I'd like to see; where pv(1) and this output \| logging would fight each-other if both were running. \| \| 4. things where there's no clean place to insert pv(1) anyway -- \| mostly, this comes up for any command that manages jobs itself in \| order to do things in parallel, e.g. any object-storage-client \| mass-copy, or any parallel-rsync script. (You'd think these \| programs would also report global progress, but they usually \| don't!) \| \| I could see pv(1) being fixed to address case 3 (by e.g. drawing \| progress while streaming stderr-logged output below it, using a \| TUI); but the other cases seem to be fundamental limitations. \| \| Personally, when I want to observe progress on some sort of \| operation that's creating files (rsync, tar/untar, etc), here's \| what I do instead: I run the command-line, and then, in a \| separate terminal connected to the machine the files are being \| written/unpacked onto, I run this: # for files \| watch -n 2 -- ls -lh $filepath # for directories \| watch -n 4 -- du -h -d 0 $dirpath \| \| If I'm in a tmux(1) session, I usually run the file-copying \| command in one pane, and then create a little three-vertical-line \| pane below it to run the observation command. \| \| Doing things this way doesn't give you a percentage progress, but \| I find that with most operations I already know what the target's \| goal size is going to be, so all I really need to know is the \| size-so-far. (And pv(1) can't tell you the target size in many \| cases anyway.) \| MayeulC wrote: \| I usually fix 3. by redirecting the intermediate program to \| stderr before piping to pv. \| \| My main use-case is netcat (nc). \| \| As an aside, I prefer the BSD version, which I find is superior \| (IPv6 support, SOCKS, etc). "GNU Netcat" isn't even part of the \| GNU project, AFAIK. I also discovered Ncat while writing this, \| from the Nmap project; I'll give it a try. \| derefr wrote: \| I don't quite understand what you mean -- by default, most \| Unix-pipeline-y tools that produce on stdout, if they log at \| all, already write their logs to stderr (that being why \| stderr exists); and pv(1) already _also_ writes to stderr (as \| if it wrote its progress to stdout, you wouldn 't be able to \| use it in a pipe!) \| \| But pv(1) is just blindly attempting to emit "\r[progress bar \| ASCII-art]\n" (plus a few regular lines) to stderr every \| second; and interleaving that into your PTY buffer along with \| actual lines of stderr output from your producer command, \| will just result in mush -- a barrage of new progress bars on \| new lines, overwriting any lines emitted directly before \| them. \| \| Having two things both writing to stderr, where one's trying \| to do something TUI-ish, and the other is attempting to write \| regular text lines, is the _problem statement_ of 3, not the \| solution to it. \| \| A _solution_ , AFAICT, would look more like: enabling pv(1) \| to (somehow) capture the stderr of the entire command-line, \| and manage it, along with drawing the progress bar. Probably \| by splitting pv(1) into two programs -- one that goes inside \| the command-line, watches progress, and emits progress logs \| as specially-tagged little messages (think: the UUID-like \| heredoc tags used in MIME-email binary-embeds) without any \| ANSI escape codes; and another, which _wraps_ your whole \| command line, parsing out the messages emitted by the inner \| pv(1) to render a progress bar on the top /bottom of the PTY \| buffer, while streaming the regular lines across the rest of \| the PTY buffer. (Probably all on the PTY secondary buffer, \| like less(1) or a text editor.) \| \| Another, probably simpler, solution would be to have a flag \| that tells pv(1) to log progress "events" (as JSON or \| whatever) to a named-FIFO filepath it would create (and then \| delete when the pipeline is over) -- or to a loopback- \| interface TCP port it would listen on -- and otherwise be \| silent; and then to have another command you can run \| asynchronously to your command-line, to open that named \| FIFO/connect to that port, and consume the events from it, \| rendering them as a progress bar; which would also quit when \| the FIFO gets deleted / when the socket is closed by the \| remote. Then you could run _that_ command, instead of \| watch(2), in another tmux(2) pane, or wherever you like. \| gpderetta wrote: \| You could redirect each pipeline stage stderr to a fifo and \| tail it from another terminal. A bit annoying to do it by \| hand though. \| gpderetta wrote: \| IIRC pv uses splice internally and simply tells the kernel to \| mive pipe buffers from one pipe to the other, so it is very \| unlikely to be a bottleneck. \| derefr wrote: \| In the dd(1) case, we're talking about "having any pipe \| involved at all" vs "no pipe, just copying internal to the \| command." The Linux kernel pipe buffer size is only 64KB, \| while my hand-optimized `bs` usually lands at ~2MB. There's a \| _big_ performance gap introduced by serially copying tiny \| (non-IO-queue-saturating) chunks at a time -- it can \| literally be a difference of minutes vs. hours to complete a \| copy. Especially when there 's high IO _latency_ on one end, \| e.g. on IaaS network disks. \| invalidator wrote: \| Try using "pv -d ". It will monitor open files on the \| process and report progress on them. \| \| 1) this gets it out of the pipeline. 2) the program gets to \| have the named arguments. 3) pv's out put is on a separate \| terminal. 4) your job never needs to know. \| \| Downside: it only sees the currently open files, so it doesn't \| work well for batch jobs. Still, it's handy to see which file \| it's on, and how fast the progress is. \| \| Also, for rsync: "--info=progress2 --no-i-r" will show you the \| progress for a whole job. \| prmoustache wrote: \| Sometimes you prefer predictability and information over sheer \| speed. If do a very large transfer that could take hours, I'd \| rather trade a bit of speed to know the progress and make sure \| nothing is stuck than launching in the blind and then repeat \| slow and expensive du commands to know where I am in the \| transfer or have to strace the process. \| derefr wrote: \| > slow and expensive du commands \| \| You'd be surprised how cheap these du(1) can be when you're \| running the _same_ du(1) command over and over. Think of it \| like running the same SQL query over and over -- the first \| time you do it, the DBMS takes its time doing IO to pull the \| relevant disk pages into the disk cache; but the Nth>=2 time, \| the query is entirely over "hot" data. Hot filesystem \| metadata pages, in this case. (Plus, for the file(s) that \| were just written by your command, the query is hot because \| those pages are still in memory from being recently dirty.) \| \| I regularly unpack tarballs containing 10 million+ files; and \| periodic du(1) over these takes only a few milliseconds of \| wall-clock time to complete. \| \| (The other bottleneck with du(1), for deep file hierarchies, \| is printing all the subdirectory sizes. Which is why the `-d \| 0` -- to only print the total.) \| \| You might be worried about something else thrashing the disk \| cache, but in my experience I've never needed to run an ETL- \| like job on a system that's _also_ running some other \| completely orthogonal IO-heavy prod workload. Usually such \| jobs are for restoring data onto new systems, migrating data \| between systems, etc.; where if there _is_ any prod workload \| running on the box, it 's one that's touching all the _same_ \| data you 're touching, and so keeping disk-cache coherency. \| leni536 wrote: \| For rsync to get reliable global progress there is --no-i-r \| --info=progress2 . --no-i-r adds a bit of upfront work, but \| it's well worth it IMO. \| derefr wrote: \| Thanks for that! (I felt like I _had_ to be missing \| something, with how useless rsync progress usually was.) \| TT-392 wrote: \| But... pipe-viewer was already a commandline youtube browser \| dima55 wrote: \| pv predates youtube itself \| trabant00 wrote: \| I've used it mostly to measure events per second with something \| like: tail -f /some/log \| grep something \| pv -lr \| > /dev/null or tcpdump expression \| pv -lr > \| /dev/null \| JayGuerette wrote: \| pv is a great tool. One of it's lesser known features is \| throttling; transfer a file without dominating your bandwidth: \| \| pv -L 200K < bigfile.iso \| ssh somehost 'cat > bigfile.iso' \| \| Complete with a progress bar, speed, and ETA. \| dspillett wrote: \| Similarly, though useful less often these days, using \| -B/--buffer-size to increase the amount that it can buffer. If \| reading data from traditional hard drives, piping that data \| through some process, and writing the result back to the same \| drives, this option can increase throughput significantly by \| reducing head movements. It can help on other storage systems \| too, but usually not so much so. \| smcl wrote: \| Oh damn that's neat I never thought to use `ssh` directly when \| transferring a file, I always used `scp bigfile.iso \| name@server.org:path/in/destination` \| tyingq wrote: \| A similar trick that's nice is piping tar through ssh. Handy \| if you don't have rsync or something better around. Even \| handy for one file, since it preserves permissions, etc. \| \| tar -cf - some/dir \| ssh remote 'cd /place/to/go && tar -xvf \| -' \| Twirrim wrote: \| I love this trick. I was dealing with some old solaris \| boxes something like 15 years ago when I learned you could \| do this. I couldn't rsync, and had started off SCP'ing \| hundreds of thousands of files across but it was going to \| take an insane length of time. Asked one of the other \| sysadmins if they knew a better way and they pointed out \| you can pipe stuff in to ssh for the other side too. Every \| now and then this technique proves useful in unexpected \| ways :) \| fbergen wrote: \| Also see `scp -l 200 bigfile.iso \| name@server.org:path/in/destination` \| \| from man page: \| \| -l limit \| \| Limits the used bandwidth, specified in Kbit/s. \| MayeulC wrote: \| Also you probably shouldn't use scp. rsync and sftp have \| mostly the same semantics. rsync \| --bwlimit=2OOK bigfile.iso \| name@server.org:path/in/destination sftp -l 200 \| bigfile.iso name@server.org:path/in/destination \| \| Although it seems that scp is becoming a wrapper around \| sftp these days: \| \| https://www.redhat.com/en/blog/openssh-scp-deprecation- \| rhel-... \| \| https://news.ycombinator.com/item?id=25005567 \| michaelmior wrote: \| Probably my favorite non-POSIX tool that I insert into my \| pipelines whenever anything takes more than a few second. I find \| it super helpful to avoid premature optimization. If I can \| quickly see that my hacked together pipeline will run in a few \| minutes and I only ever need to do that once, I'll probably just \| let it finish. If it's going to take a few hours, I might decide \| it's worth optimizing. \| \| It also helps me optimize my time. If something is going to \| finish in a few minutes, I probably won't context switch to \| another major task. However, if something is going to take a few \| hours then I'll probably switch to work on something different \| knowing approximately when I can go back and check on results. \| systems_glitch wrote: \| Same, one of the first utilities I install on a new system. \| dang wrote: \| Related: \| \| _PV (Pipe Viewer) - add a progress bar to most command-line \| programs_ - https://news.ycombinator.com/item?id=23826845 - July \| 2020 (2 comments) \| \| _A Unix Utility You Should Know About: Pipe Viewer_ - \| https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1 \| comment) \| \| _Pipe Viewer_ - https://news.ycombinator.com/item?id=5942115 - \| June 2013 (1 comment) \| \| _Pipe Viewer_ - https://news.ycombinator.com/item?id=4020026 - \| May 2012 (26 comments) \| \| _A Unix Utility You Should Know About: Pipe Viewer_ - \| https://news.ycombinator.com/item?id=462244 - Feb 2009 (63 \| comments) \| est wrote: \| pv was the tool when I discovered sometimes the VPS have only \| 10Gbps memory copy speed. \| cwillu wrote: \| pv -d $(pidof xz):1 is great for when you realize too late that \| something is slow enough that you want a progress indication, and \| definitely do not want to restart from scratch. \| dspillett wrote: \| Another good option for that, which works in a number of other \| useful circumstances too, is progress: \| https://github.com/Xfennec/progress \| xuhu wrote: \| How `pv -d` work ? Does it use perf probes or attach to the \| target PID ? \| remram wrote: \| It finds the file using /proc//fd/ and watches its \| size grow. It doesn't work with pipes, devices, a file being \| overwritten (not appended to), or anything whose size doesn't \| grow. \| cwillu wrote: \| It appears to monitor the contents of /proc//fdinfo/ ___________________________________________________________________ (page generated 2022-10-19 23:00 UTC)