|
| londons_explore wrote:
| It would be nice to indicate if the upstream or the downstream is
| the 'limiting' factor in speed.
|
| Ie. within pv, is it the reading the input stream or the writing
| the output stream that is blocking most of the time?
| ketralnis wrote:
| It's open source, be the change you want to see in the world
| kotlin2 wrote:
| Having maintained an open source library, it's actually
| really helpful to see features people want. Not everyone
| needs to contribute directly to the code base. User feedback
| is valuable, too.
| bingaling wrote:
| it's instantaneous, but the -T (transfer buffer % full display)
| is sometimes useful for that. (0% full -> source limited, 100%
| full -> sink limited)
| Twirrim wrote:
| Oh wow, I'd completely missed that -T flag. That's some
| useful data. Thanks for mentioning it!
| senjin wrote:
| This would be a genius addition
| heinrich5991 wrote:
| `progress` is also a nice tool to see progress of programs
| operating linearly on a single file. A lot of tools do that!
| sigmonsays wrote:
| i've consistently lost and found this tool over and over again
| for over 20 years
| pbhjpbhj wrote:
| Same, `apropos $keyword` helps, but strangely in this case
| doesn't find `progress` from `apropos progress`.
| sneak wrote:
| part of my default install.
| torgard wrote:
| There are countless times where I would have found this
| incredibly helpful. Just 10 minutes ago, I wanted this exact
| tool.
|
| Thanks!
| derefr wrote:
| As a person who runs a lot of ETL-like commands at work, I never
| find myself using pv(1). I love the idea of it, but for the
| commands I most want to measure progress of, they always seem to
| be either:
|
| 1. things where I'd be paranoid about pv(1) itself becoming the
| bottleneck in the pipeline -- e.g. dd(1) of large disks where
| I've explicitly set a large blocksize and set
| conv=idirect/odirect, to optimize throughput.
|
| 2. things where the program has some useful cleverness I rely on
| that requires being fed by a named file argument, but behaves a
| lot less intelligently when being fed from stdin -- e.g. feeding
| SQL files into psql(1).
|
| 3. things where the program, even while writing to stdout, also
| produces useful "sampled progress" informational messages on
| stderr, which I'd like to see; where pv(1) and this output
| logging would fight each-other if both were running.
|
| 4. things where there's no clean place to insert pv(1) anyway --
| mostly, this comes up for any command that manages jobs itself in
| order to do things in parallel, e.g. any object-storage-client
| mass-copy, or any parallel-rsync script. (You'd think these
| programs would also report global progress, but they usually
| don't!)
|
| I could see pv(1) being fixed to address case 3 (by e.g. drawing
| progress while streaming stderr-logged output below it, using a
| TUI); but the other cases seem to be fundamental limitations.
|
| Personally, when I want to observe progress on some sort of
| operation that's creating files (rsync, tar/untar, etc), here's
| what I do instead: I run the command-line, and then, in a
| separate terminal connected to the machine the files are being
| written/unpacked onto, I run this: # for files
| watch -n 2 -- ls -lh $filepath # for directories
| watch -n 4 -- du -h -d 0 $dirpath
|
| If I'm in a tmux(1) session, I usually run the file-copying
| command in one pane, and then create a little three-vertical-line
| pane below it to run the observation command.
|
| Doing things this way doesn't give you a percentage progress, but
| I find that with most operations I already know what the target's
| goal size is going to be, so all I really need to know is the
| size-so-far. (And pv(1) can't tell you the target size in many
| cases anyway.)
| MayeulC wrote:
| I usually fix 3. by redirecting the intermediate program to
| stderr before piping to pv.
|
| My main use-case is netcat (nc).
|
| As an aside, I prefer the BSD version, which I find is superior
| (IPv6 support, SOCKS, etc). "GNU Netcat" isn't even part of the
| GNU project, AFAIK. I also discovered Ncat while writing this,
| from the Nmap project; I'll give it a try.
| derefr wrote:
| I don't quite understand what you mean -- by default, most
| Unix-pipeline-y tools that produce on stdout, if they log at
| all, already write their logs to stderr (that being why
| stderr exists); and pv(1) already _also_ writes to stderr (as
| if it wrote its progress to stdout, you wouldn 't be able to
| use it in a pipe!)
|
| But pv(1) is just blindly attempting to emit "\r[progress bar
| ASCII-art]\n" (plus a few regular lines) to stderr every
| second; and interleaving that into your PTY buffer along with
| actual lines of stderr output from your producer command,
| will just result in mush -- a barrage of new progress bars on
| new lines, overwriting any lines emitted directly before
| them.
|
| Having two things both writing to stderr, where one's trying
| to do something TUI-ish, and the other is attempting to write
| regular text lines, is the _problem statement_ of 3, not the
| solution to it.
|
| A _solution_ , AFAICT, would look more like: enabling pv(1)
| to (somehow) capture the stderr of the entire command-line,
| and manage it, along with drawing the progress bar. Probably
| by splitting pv(1) into two programs -- one that goes inside
| the command-line, watches progress, and emits progress logs
| as specially-tagged little messages (think: the UUID-like
| heredoc tags used in MIME-email binary-embeds) without any
| ANSI escape codes; and another, which _wraps_ your whole
| command line, parsing out the messages emitted by the inner
| pv(1) to render a progress bar on the top /bottom of the PTY
| buffer, while streaming the regular lines across the rest of
| the PTY buffer. (Probably all on the PTY secondary buffer,
| like less(1) or a text editor.)
|
| Another, probably simpler, solution would be to have a flag
| that tells pv(1) to log progress "events" (as JSON or
| whatever) to a named-FIFO filepath it would create (and then
| delete when the pipeline is over) -- or to a loopback-
| interface TCP port it would listen on -- and otherwise be
| silent; and then to have another command you can run
| asynchronously to your command-line, to open that named
| FIFO/connect to that port, and consume the events from it,
| rendering them as a progress bar; which would also quit when
| the FIFO gets deleted / when the socket is closed by the
| remote. Then you could run _that_ command, instead of
| watch(2), in another tmux(2) pane, or wherever you like.
| gpderetta wrote:
| You could redirect each pipeline stage stderr to a fifo and
| tail it from another terminal. A bit annoying to do it by
| hand though.
| gpderetta wrote:
| IIRC pv uses splice internally and simply tells the kernel to
| mive pipe buffers from one pipe to the other, so it is very
| unlikely to be a bottleneck.
| derefr wrote:
| In the dd(1) case, we're talking about "having any pipe
| involved at all" vs "no pipe, just copying internal to the
| command." The Linux kernel pipe buffer size is only 64KB,
| while my hand-optimized `bs` usually lands at ~2MB. There's a
| _big_ performance gap introduced by serially copying tiny
| (non-IO-queue-saturating) chunks at a time -- it can
| literally be a difference of minutes vs. hours to complete a
| copy. Especially when there 's high IO _latency_ on one end,
| e.g. on IaaS network disks.
| invalidator wrote:
| Try using "pv -d ". It will monitor open files on the
| process and report progress on them.
|
| 1) this gets it out of the pipeline. 2) the program gets to
| have the named arguments. 3) pv's out put is on a separate
| terminal. 4) your job never needs to know.
|
| Downside: it only sees the currently open files, so it doesn't
| work well for batch jobs. Still, it's handy to see which file
| it's on, and how fast the progress is.
|
| Also, for rsync: "--info=progress2 --no-i-r" will show you the
| progress for a whole job.
| prmoustache wrote:
| Sometimes you prefer predictability and information over sheer
| speed. If do a very large transfer that could take hours, I'd
| rather trade a bit of speed to know the progress and make sure
| nothing is stuck than launching in the blind and then repeat
| slow and expensive du commands to know where I am in the
| transfer or have to strace the process.
| derefr wrote:
| > slow and expensive du commands
|
| You'd be surprised how cheap these du(1) can be when you're
| running the _same_ du(1) command over and over. Think of it
| like running the same SQL query over and over -- the first
| time you do it, the DBMS takes its time doing IO to pull the
| relevant disk pages into the disk cache; but the Nth>=2 time,
| the query is entirely over "hot" data. Hot filesystem
| metadata pages, in this case. (Plus, for the file(s) that
| were just written by your command, the query is hot because
| those pages are still in memory from being recently dirty.)
|
| I regularly unpack tarballs containing 10 million+ files; and
| periodic du(1) over these takes only a few milliseconds of
| wall-clock time to complete.
|
| (The other bottleneck with du(1), for deep file hierarchies,
| is printing all the subdirectory sizes. Which is why the `-d
| 0` -- to only print the total.)
|
| You might be worried about something else thrashing the disk
| cache, but in my experience I've never needed to run an ETL-
| like job on a system that's _also_ running some other
| completely orthogonal IO-heavy prod workload. Usually such
| jobs are for restoring data onto new systems, migrating data
| between systems, etc.; where if there _is_ any prod workload
| running on the box, it 's one that's touching all the _same_
| data you 're touching, and so keeping disk-cache coherency.
| leni536 wrote:
| For rsync to get reliable global progress there is --no-i-r
| --info=progress2 . --no-i-r adds a bit of upfront work, but
| it's well worth it IMO.
| derefr wrote:
| Thanks for that! (I felt like I _had_ to be missing
| something, with how useless rsync progress usually was.)
| TT-392 wrote:
| But... pipe-viewer was already a commandline youtube browser
| dima55 wrote:
| pv predates youtube itself
| trabant00 wrote:
| I've used it mostly to measure events per second with something
| like: tail -f /some/log | grep something | pv -lr
| > /dev/null or tcpdump expression | pv -lr >
| /dev/null
| JayGuerette wrote:
| pv is a great tool. One of it's lesser known features is
| throttling; transfer a file without dominating your bandwidth:
|
| pv -L 200K < bigfile.iso | ssh somehost 'cat > bigfile.iso'
|
| Complete with a progress bar, speed, and ETA.
| dspillett wrote:
| Similarly, though useful less often these days, using
| -B/--buffer-size to increase the amount that it can buffer. If
| reading data from traditional hard drives, piping that data
| through some process, and writing the result back to the same
| drives, this option can increase throughput significantly by
| reducing head movements. It can help on other storage systems
| too, but usually not so much so.
| smcl wrote:
| Oh damn that's neat I never thought to use `ssh` directly when
| transferring a file, I always used `scp bigfile.iso
| name@server.org:path/in/destination`
| tyingq wrote:
| A similar trick that's nice is piping tar through ssh. Handy
| if you don't have rsync or something better around. Even
| handy for one file, since it preserves permissions, etc.
|
| tar -cf - some/dir | ssh remote 'cd /place/to/go && tar -xvf
| -'
| Twirrim wrote:
| I love this trick. I was dealing with some old solaris
| boxes something like 15 years ago when I learned you could
| do this. I couldn't rsync, and had started off SCP'ing
| hundreds of thousands of files across but it was going to
| take an insane length of time. Asked one of the other
| sysadmins if they knew a better way and they pointed out
| you can pipe stuff in to ssh for the other side too. Every
| now and then this technique proves useful in unexpected
| ways :)
| fbergen wrote:
| Also see `scp -l 200 bigfile.iso
| name@server.org:path/in/destination`
|
| from man page:
|
| -l limit
|
| Limits the used bandwidth, specified in Kbit/s.
| MayeulC wrote:
| Also you probably shouldn't use scp. rsync and sftp have
| mostly the same semantics. rsync
| --bwlimit=2OOK bigfile.iso
| name@server.org:path/in/destination sftp -l 200
| bigfile.iso name@server.org:path/in/destination
|
| Although it seems that scp is becoming a wrapper around
| sftp these days:
|
| https://www.redhat.com/en/blog/openssh-scp-deprecation-
| rhel-...
|
| https://news.ycombinator.com/item?id=25005567
| michaelmior wrote:
| Probably my favorite non-POSIX tool that I insert into my
| pipelines whenever anything takes more than a few second. I find
| it super helpful to avoid premature optimization. If I can
| quickly see that my hacked together pipeline will run in a few
| minutes and I only ever need to do that once, I'll probably just
| let it finish. If it's going to take a few hours, I might decide
| it's worth optimizing.
|
| It also helps me optimize my time. If something is going to
| finish in a few minutes, I probably won't context switch to
| another major task. However, if something is going to take a few
| hours then I'll probably switch to work on something different
| knowing approximately when I can go back and check on results.
| systems_glitch wrote:
| Same, one of the first utilities I install on a new system.
| dang wrote:
| Related:
|
| _PV (Pipe Viewer) - add a progress bar to most command-line
| programs_ - https://news.ycombinator.com/item?id=23826845 - July
| 2020 (2 comments)
|
| _A Unix Utility You Should Know About: Pipe Viewer_ -
| https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1
| comment)
|
| _Pipe Viewer_ - https://news.ycombinator.com/item?id=5942115 -
| June 2013 (1 comment)
|
| _Pipe Viewer_ - https://news.ycombinator.com/item?id=4020026 -
| May 2012 (26 comments)
|
| _A Unix Utility You Should Know About: Pipe Viewer_ -
| https://news.ycombinator.com/item?id=462244 - Feb 2009 (63
| comments)
| est wrote:
| pv was the tool when I discovered sometimes the VPS have only
| 10Gbps memory copy speed.
| cwillu wrote:
| pv -d $(pidof xz):1 is great for when you realize too late that
| something is slow enough that you want a progress indication, and
| definitely do not want to restart from scratch.
| dspillett wrote:
| Another good option for that, which works in a number of other
| useful circumstances too, is progress:
| https://github.com/Xfennec/progress
| xuhu wrote:
| How `pv -d` work ? Does it use perf probes or attach to the
| target PID ?
| remram wrote:
| It finds the file using /proc//fd/ and watches its
| size grow. It doesn't work with pipes, devices, a file being
| overwritten (not appended to), or anything whose size doesn't
| grow.
| cwillu wrote:
| It appears to monitor the contents of /proc//fdinfo/
___________________________________________________________________
(page generated 2022-10-19 23:00 UTC) |