[HN Gopher] Pipe Viewer
___________________________________________________________________
 
Pipe Viewer
 
Author : 0x45696e6172
Score  : 141 points
Date   : 2022-10-18 09:15 UTC (1 days ago)
 
web link (www.ivarch.com)
w3m dump (www.ivarch.com)
 
| londons_explore wrote:
| It would be nice to indicate if the upstream or the downstream is
| the 'limiting' factor in speed.
| 
| Ie. within pv, is it the reading the input stream or the writing
| the output stream that is blocking most of the time?
 
  | ketralnis wrote:
  | It's open source, be the change you want to see in the world
 
    | kotlin2 wrote:
    | Having maintained an open source library, it's actually
    | really helpful to see features people want. Not everyone
    | needs to contribute directly to the code base. User feedback
    | is valuable, too.
 
  | bingaling wrote:
  | it's instantaneous, but the -T (transfer buffer % full display)
  | is sometimes useful for that. (0% full -> source limited, 100%
  | full -> sink limited)
 
    | Twirrim wrote:
    | Oh wow, I'd completely missed that -T flag. That's some
    | useful data. Thanks for mentioning it!
 
  | senjin wrote:
  | This would be a genius addition
 
| heinrich5991 wrote:
| `progress` is also a nice tool to see progress of programs
| operating linearly on a single file. A lot of tools do that!
 
| sigmonsays wrote:
| i've consistently lost and found this tool over and over again
| for over 20 years
 
  | pbhjpbhj wrote:
  | Same, `apropos $keyword` helps, but strangely in this case
  | doesn't find `progress` from `apropos progress`.
 
| sneak wrote:
| part of my default install.
 
| torgard wrote:
| There are countless times where I would have found this
| incredibly helpful. Just 10 minutes ago, I wanted this exact
| tool.
| 
| Thanks!
 
| derefr wrote:
| As a person who runs a lot of ETL-like commands at work, I never
| find myself using pv(1). I love the idea of it, but for the
| commands I most want to measure progress of, they always seem to
| be either:
| 
| 1. things where I'd be paranoid about pv(1) itself becoming the
| bottleneck in the pipeline -- e.g. dd(1) of large disks where
| I've explicitly set a large blocksize and set
| conv=idirect/odirect, to optimize throughput.
| 
| 2. things where the program has some useful cleverness I rely on
| that requires being fed by a named file argument, but behaves a
| lot less intelligently when being fed from stdin -- e.g. feeding
| SQL files into psql(1).
| 
| 3. things where the program, even while writing to stdout, also
| produces useful "sampled progress" informational messages on
| stderr, which I'd like to see; where pv(1) and this output
| logging would fight each-other if both were running.
| 
| 4. things where there's no clean place to insert pv(1) anyway --
| mostly, this comes up for any command that manages jobs itself in
| order to do things in parallel, e.g. any object-storage-client
| mass-copy, or any parallel-rsync script. (You'd think these
| programs would also report global progress, but they usually
| don't!)
| 
| I could see pv(1) being fixed to address case 3 (by e.g. drawing
| progress while streaming stderr-logged output below it, using a
| TUI); but the other cases seem to be fundamental limitations.
| 
| Personally, when I want to observe progress on some sort of
| operation that's creating files (rsync, tar/untar, etc), here's
| what I do instead: I run the command-line, and then, in a
| separate terminal connected to the machine the files are being
| written/unpacked onto, I run this:                   # for files
| watch -n 2 -- ls -lh $filepath              # for directories
| watch -n 4 -- du -h -d 0 $dirpath
| 
| If I'm in a tmux(1) session, I usually run the file-copying
| command in one pane, and then create a little three-vertical-line
| pane below it to run the observation command.
| 
| Doing things this way doesn't give you a percentage progress, but
| I find that with most operations I already know what the target's
| goal size is going to be, so all I really need to know is the
| size-so-far. (And pv(1) can't tell you the target size in many
| cases anyway.)
 
  | MayeulC wrote:
  | I usually fix 3. by redirecting the intermediate program to
  | stderr before piping to pv.
  | 
  | My main use-case is netcat (nc).
  | 
  | As an aside, I prefer the BSD version, which I find is superior
  | (IPv6 support, SOCKS, etc). "GNU Netcat" isn't even part of the
  | GNU project, AFAIK. I also discovered Ncat while writing this,
  | from the Nmap project; I'll give it a try.
 
    | derefr wrote:
    | I don't quite understand what you mean -- by default, most
    | Unix-pipeline-y tools that produce on stdout, if they log at
    | all, already write their logs to stderr (that being why
    | stderr exists); and pv(1) already _also_ writes to stderr (as
    | if it wrote its progress to stdout, you wouldn 't be able to
    | use it in a pipe!)
    | 
    | But pv(1) is just blindly attempting to emit "\r[progress bar
    | ASCII-art]\n" (plus a few regular lines) to stderr every
    | second; and interleaving that into your PTY buffer along with
    | actual lines of stderr output from your producer command,
    | will just result in mush -- a barrage of new progress bars on
    | new lines, overwriting any lines emitted directly before
    | them.
    | 
    | Having two things both writing to stderr, where one's trying
    | to do something TUI-ish, and the other is attempting to write
    | regular text lines, is the _problem statement_ of 3, not the
    | solution to it.
    | 
    | A _solution_ , AFAICT, would look more like: enabling pv(1)
    | to (somehow) capture the stderr of the entire command-line,
    | and manage it, along with drawing the progress bar. Probably
    | by splitting pv(1) into two programs -- one that goes inside
    | the command-line, watches progress, and emits progress logs
    | as specially-tagged little messages (think: the UUID-like
    | heredoc tags used in MIME-email binary-embeds) without any
    | ANSI escape codes; and another, which _wraps_ your whole
    | command line, parsing out the messages emitted by the inner
    | pv(1) to render a progress bar on the top /bottom of the PTY
    | buffer, while streaming the regular lines across the rest of
    | the PTY buffer. (Probably all on the PTY secondary buffer,
    | like less(1) or a text editor.)
    | 
    | Another, probably simpler, solution would be to have a flag
    | that tells pv(1) to log progress "events" (as JSON or
    | whatever) to a named-FIFO filepath it would create (and then
    | delete when the pipeline is over) -- or to a loopback-
    | interface TCP port it would listen on -- and otherwise be
    | silent; and then to have another command you can run
    | asynchronously to your command-line, to open that named
    | FIFO/connect to that port, and consume the events from it,
    | rendering them as a progress bar; which would also quit when
    | the FIFO gets deleted / when the socket is closed by the
    | remote. Then you could run _that_ command, instead of
    | watch(2), in another tmux(2) pane, or wherever you like.
 
      | gpderetta wrote:
      | You could redirect each pipeline stage stderr to a fifo and
      | tail it from another terminal. A bit annoying to do it by
      | hand though.
 
  | gpderetta wrote:
  | IIRC pv uses splice internally and simply tells the kernel to
  | mive pipe buffers from one pipe to the other, so it is very
  | unlikely to be a bottleneck.
 
    | derefr wrote:
    | In the dd(1) case, we're talking about "having any pipe
    | involved at all" vs "no pipe, just copying internal to the
    | command." The Linux kernel pipe buffer size is only 64KB,
    | while my hand-optimized `bs` usually lands at ~2MB. There's a
    | _big_ performance gap introduced by serially copying tiny
    | (non-IO-queue-saturating) chunks at a time -- it can
    | literally be a difference of minutes vs. hours to complete a
    | copy. Especially when there 's high IO _latency_ on one end,
    | e.g. on IaaS network disks.
 
  | invalidator wrote:
  | Try using "pv -d ". It will monitor open files on the
  | process and report progress on them.
  | 
  | 1) this gets it out of the pipeline. 2) the program gets to
  | have the named arguments. 3) pv's out put is on a separate
  | terminal. 4) your job never needs to know.
  | 
  | Downside: it only sees the currently open files, so it doesn't
  | work well for batch jobs. Still, it's handy to see which file
  | it's on, and how fast the progress is.
  | 
  | Also, for rsync: "--info=progress2 --no-i-r" will show you the
  | progress for a whole job.
 
  | prmoustache wrote:
  | Sometimes you prefer predictability and information over sheer
  | speed. If do a very large transfer that could take hours, I'd
  | rather trade a bit of speed to know the progress and make sure
  | nothing is stuck than launching in the blind and then repeat
  | slow and expensive du commands to know where I am in the
  | transfer or have to strace the process.
 
    | derefr wrote:
    | > slow and expensive du commands
    | 
    | You'd be surprised how cheap these du(1) can be when you're
    | running the _same_ du(1) command over and over. Think of it
    | like running the same SQL query over and over -- the first
    | time you do it, the DBMS takes its time doing IO to pull the
    | relevant disk pages into the disk cache; but the Nth>=2 time,
    | the query is entirely over  "hot" data. Hot filesystem
    | metadata pages, in this case. (Plus, for the file(s) that
    | were just written by your command, the query is hot because
    | those pages are still in memory from being recently dirty.)
    | 
    | I regularly unpack tarballs containing 10 million+ files; and
    | periodic du(1) over these takes only a few milliseconds of
    | wall-clock time to complete.
    | 
    | (The other bottleneck with du(1), for deep file hierarchies,
    | is printing all the subdirectory sizes. Which is why the `-d
    | 0` -- to only print the total.)
    | 
    | You might be worried about something else thrashing the disk
    | cache, but in my experience I've never needed to run an ETL-
    | like job on a system that's _also_ running some other
    | completely orthogonal IO-heavy prod workload. Usually such
    | jobs are for restoring data onto new systems, migrating data
    | between systems, etc.; where if there _is_ any prod workload
    | running on the box, it 's one that's touching all the _same_
    | data you 're touching, and so keeping disk-cache coherency.
 
  | leni536 wrote:
  | For rsync to get reliable global progress there is --no-i-r
  | --info=progress2 . --no-i-r adds a bit of upfront work, but
  | it's well worth it IMO.
 
    | derefr wrote:
    | Thanks for that! (I felt like I _had_ to be missing
    | something, with how useless rsync progress usually was.)
 
| TT-392 wrote:
| But... pipe-viewer was already a commandline youtube browser
 
  | dima55 wrote:
  | pv predates youtube itself
 
| trabant00 wrote:
| I've used it mostly to measure events per second with something
| like:                 tail -f /some/log | grep something | pv -lr
| > /dev/null       or       tcpdump expression | pv -lr >
| /dev/null
 
| JayGuerette wrote:
| pv is a great tool. One of it's lesser known features is
| throttling; transfer a file without dominating your bandwidth:
| 
| pv -L 200K < bigfile.iso | ssh somehost 'cat > bigfile.iso'
| 
| Complete with a progress bar, speed, and ETA.
 
  | dspillett wrote:
  | Similarly, though useful less often these days, using
  | -B/--buffer-size to increase the amount that it can buffer. If
  | reading data from traditional hard drives, piping that data
  | through some process, and writing the result back to the same
  | drives, this option can increase throughput significantly by
  | reducing head movements. It can help on other storage systems
  | too, but usually not so much so.
 
  | smcl wrote:
  | Oh damn that's neat I never thought to use `ssh` directly when
  | transferring a file, I always used `scp bigfile.iso
  | name@server.org:path/in/destination`
 
    | tyingq wrote:
    | A similar trick that's nice is piping tar through ssh. Handy
    | if you don't have rsync or something better around. Even
    | handy for one file, since it preserves permissions, etc.
    | 
    | tar -cf - some/dir | ssh remote 'cd /place/to/go && tar -xvf
    | -'
 
      | Twirrim wrote:
      | I love this trick. I was dealing with some old solaris
      | boxes something like 15 years ago when I learned you could
      | do this. I couldn't rsync, and had started off SCP'ing
      | hundreds of thousands of files across but it was going to
      | take an insane length of time. Asked one of the other
      | sysadmins if they knew a better way and they pointed out
      | you can pipe stuff in to ssh for the other side too. Every
      | now and then this technique proves useful in unexpected
      | ways :)
 
    | fbergen wrote:
    | Also see `scp -l 200 bigfile.iso
    | name@server.org:path/in/destination`
    | 
    | from man page:
    | 
    | -l limit
    | 
    | Limits the used bandwidth, specified in Kbit/s.
 
      | MayeulC wrote:
      | Also you probably shouldn't use scp. rsync and sftp have
      | mostly the same semantics.                   rsync
      | --bwlimit=2OOK bigfile.iso
      | name@server.org:path/in/destination         sftp -l 200
      | bigfile.iso name@server.org:path/in/destination
      | 
      | Although it seems that scp is becoming a wrapper around
      | sftp these days:
      | 
      | https://www.redhat.com/en/blog/openssh-scp-deprecation-
      | rhel-...
      | 
      | https://news.ycombinator.com/item?id=25005567
 
| michaelmior wrote:
| Probably my favorite non-POSIX tool that I insert into my
| pipelines whenever anything takes more than a few second. I find
| it super helpful to avoid premature optimization. If I can
| quickly see that my hacked together pipeline will run in a few
| minutes and I only ever need to do that once, I'll probably just
| let it finish. If it's going to take a few hours, I might decide
| it's worth optimizing.
| 
| It also helps me optimize my time. If something is going to
| finish in a few minutes, I probably won't context switch to
| another major task. However, if something is going to take a few
| hours then I'll probably switch to work on something different
| knowing approximately when I can go back and check on results.
 
  | systems_glitch wrote:
  | Same, one of the first utilities I install on a new system.
 
| dang wrote:
| Related:
| 
|  _PV (Pipe Viewer) - add a progress bar to most command-line
| programs_ - https://news.ycombinator.com/item?id=23826845 - July
| 2020 (2 comments)
| 
|  _A Unix Utility You Should Know About: Pipe Viewer_ -
| https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1
| comment)
| 
|  _Pipe Viewer_ - https://news.ycombinator.com/item?id=5942115 -
| June 2013 (1 comment)
| 
|  _Pipe Viewer_ - https://news.ycombinator.com/item?id=4020026 -
| May 2012 (26 comments)
| 
|  _A Unix Utility You Should Know About: Pipe Viewer_ -
| https://news.ycombinator.com/item?id=462244 - Feb 2009 (63
| comments)
 
| est wrote:
| pv was the tool when I discovered sometimes the VPS have only
| 10Gbps memory copy speed.
 
| cwillu wrote:
| pv -d $(pidof xz):1 is great for when you realize too late that
| something is slow enough that you want a progress indication, and
| definitely do not want to restart from scratch.
 
  | dspillett wrote:
  | Another good option for that, which works in a number of other
  | useful circumstances too, is progress:
  | https://github.com/Xfennec/progress
 
  | xuhu wrote:
  | How `pv -d` work ? Does it use perf probes or attach to the
  | target PID ?
 
    | remram wrote:
    | It finds the file using /proc//fd/ and watches its
    | size grow. It doesn't work with pipes, devices, a file being
    | overwritten (not appended to), or anything whose size doesn't
    | grow.
 
    | cwillu wrote:
    | It appears to monitor the contents of /proc//fdinfo/
 
___________________________________________________________________
(page generated 2022-10-19 23:00 UTC)