This is a text-only version of the following page on https://raymii.org: --- Title : Semi-accurate live stream viewer count (hls/rtmp/restreamer) on the command line Author : Remy van Elst Date : 25-11-2020 URL : https://raymii.org/s/tutorials/Semi-accurate_live_stream_viewer_count.html Format : Markdown/HTML --- Due to all the working-from-home in the past few months I had to setup a live stream. At first the stream went directly to YouTube, but after they've screwed up multiple times, we decided to not be dependent on them. Using [restreamer][5], a piece of open source software to live stream both to your own server and to another (YouTube) at the same time, we have more control over the stream and are not surprised by YouTube doing stupid stuff unannounced. Restreamer provides a simple web player that works on all major platforms and streams to YouTube, but one thing it lacks is a live viewer count. That's a hard problem to solve correctly and accurately, in this article I'll show you how to do it semi-accurately via multiple ways, including graphs like this: ![stream munin graph][1] This article contains a rant on YouTube breaking stuff and the commands used to get a live viewer count. <p class="ad"> <b>Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:</b><br><br> <a href="https://leafnode.nl">I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!</a><br><br> <a href="https://github.com/sponsors/RaymiiOrg/">Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.</a><br><br> <a href="https://www.digitalocean.com/?refcode=7435ae6b8212">You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days. </a><br><br> </p> Do note that this article is about the open source [Restreamer][5] software, not the paid for restreamer.io website. I did [contribute][6] to restreamer since it's such a great piece of software. ### A rant on YouTube and Google First a new version of YouTube live was released, [resetting our stream key, without notifications][4]. Now they also require you to open the live control room web page, [otherwise your stream won't start][3], again, without any notification or help. The live stream is quite critical due to a limit on how many people can physically be on location. The workflow was simple before, click the button "Start Streaming" in OBS. Now it involves logging in to an account with 2 factor authentication, opening up a web page, getting a few dialogs on title and options, then clicking start in OBS and checking if everything on YouTube works. Way more error-prone and more ways to, by accident, make mistakes. Using [Restreamer][5], we have more control over the live stream. It's on our own server, [a $5 droplet at Digital Ocean][99], which can easily support 120 concurrent viewers with a 5 mbit stream. We don't want to break existing work flows for our viewers (as opposed to what Google does), so we stream to YouTube as well (via restreamer). For the people that are used to YouTube, they can keep watching as they're used to. But, if for whatever reason, something breaks, our own server and stream are available. Continuity is guaranteed even with the incoherent stupidity and breaking changes of YouTube. Now we have that rant out of the way, lets continue on with something productive for which you came here, measuring live stream view count. ### Measuring live viewers? First and foremost, there is a [feature request][7] over at Restreamer to implement a viewer count, so be sure to check that issue as it might be more up to date than this page. There are multiple ways to measure viewers, with different levels of privacy invasion and accuracy. You can use Google Analytics, write some custom javascript with a unique id and track that page, use log analysis, bandwidth interpolation or count established connections. We ended up using connection measurement and bandwidth interpolation because that is accurate enough for our use case. I'll not cover the custom javascript option, only Google Analytics, bandwidth interpolation and connection counting. All of our analytics show that we had about 13 viewers on the live stream, with 9 that watched almost the entire stream and 4 that watched a bit. #### Google Analytics in Clappr The the most accurate measurement in my experience is to [use Google analytics with the clappr web player][8], which gives you a count of every event, start, stop, pause, etc: ![clappr Google analytics][9] (This graph is from a friend of mine with a larger live stream audience) However, this is very privacy invasive, sends all that sweet data to Google and allow you to track every event back to a single person inside the analytics console. Although the measurements were the most accurate, we decided it was to privacy invasive for our intended goal. #### Bandwidth interpolation Our stream has a set bitrate of 5 mbit. Restreamer does not do encoding, so every 5 mbit out in that time should correspond to one stream viewer. Using the graphs [Digital Ocean][99] provides, we can do a simple division: 55 mbit out / 5 mbit in = 11 viewers Here is the graph of Digital Ocean: ![stream bandwidth][2] If your provider does not provide such graphs, you can use your own monitoring tools, like the command line `vnstat` utility: vnstat --top10 --style 1 Output: eth0 / top 10 # day rx | tx | total -------------------------------+-------------+--------------------------------- 1 11/15/20 3.38 GiB | 35.21 GiB | 38.58 GiB %%:::::::::::::::: 2 11/22/20 3.82 GiB | 30.89 GiB | 34.71 GiB %%:::::::::::::: 3 10/18/20 6.54 GiB | 17.61 GiB | 24.15 GiB %%%:::::::: 4 11/01/20 3.48 GiB | 3.68 GiB | 7.17 GiB %:: 5 10/16/20 1.82 GiB | 2.31 GiB | 4.13 GiB : 6 10/28/20 2.03 GiB | 2.03 GiB | 4.07 GiB % 7 11/08/20 1.14 GiB | 2.82 GiB | 3.96 GiB : 8 10/15/20 1.38 GiB | 891.15 MiB | 2.26 GiB % 9 10/14/20 857.32 MiB | 1.37 GiB | 2.21 GiB : 10 11/12/20 686.86 MiB | 1.09 GiB | 1.76 GiB -------------------------------+-------------+--------------------------------- 5 mbit/s for 90 minutes in gigabyte is 27 Gb (gigabit), is 3.375 GB (gigabytes) for one full time stream viewer. 30.89 GB / 3.375 GB equals 9.15 full time viewers. Not everyone watches the full length stream, as you can see on the graph as well. #### Log analytics In the web server log we can check how many IP's connected to the `.m3u8` file (the stream) during the time we want to analyze. A tool like `goacces` can give you a visual overview, in the below picture you can see every visitor sorted by traffic, 7 on this page and 2 on the next page have downloaded over 2 GB of traffic, we can safely assume that are stream viewers, giving us 9 viewers and 2 that watched half the stream according to the amount of traffic. ![goaccess log][11] The picture was filtered by date and by the filename of the stream `live.m3u8`. On the command line you can use a bit of `awk` to sum up all traffic for requests per IP. zcat /var/log/nginx/access.log.3.gz | \ awk ' { total[$1] += $10 } \ END { for (x in total) { printf "%s : %9.2f Mb\n", x, total[x]/1024/1024 } } ' | sort -k2 Output: 86..... : 748.54 Mb 84......: 1438.23 Mb 86......: 2560.98 Mb 85......: 2634.83 Mb 2001:...: 2745.76 Mb 86..... : 2827.64 Mb 212.... : 3018.51 Mb 212.... : 3076.66 Mb 2001:...: 3196.46 Mb 85..... : 3220.37 Mb 86......: 3347.41 Mb (I've filtered out the IP addresses but you get the gist) In the NCSA log format (what nginx and Apache use) the first item is the visiting IP (or hostname) and the tenth item is the amount of bytes that request took. Using awk we count up each request amount of bytes for the IP (`{ total[$1] += $10 } `), then print that divided back to Mb (`END { for (x in total) { printf "%s....`), sorting by column 2 (`sort -k2`). Here we can see 9 full stream viewers and two that watched for a bit. By adding a few grep commands you can filter some more on the input, for example, on status code (` grep ' 200 '`) or split up the user agents. #### Measuring established connections In [this article][10] I described how you can measure the amount of established connections. HTTP normally does not work very well with this technique since they are not long running connections. HLS however, is a longer running connection, since segments of the stream are downloaded, which in turn counts as a long established connection for my way of measuring. Using the following command we can get a count of currently established connections to port 443 (which is an nginx reverse proxy in front of restreamer). ss --all --numeric --no-header state established '( sport = :443 )' | wc -l However, that also gets you the one off HTTP hits that might be established on that exact second you execute the command. Not super accurate, most often giving almost double the amount that I expected were watching the stream. But since we're on the command line, we have another tool to the rescue. You're probably familiar with `diff`, a command that shows the differences between to files. In this case we need the exact opposite of that, I want to execute the command twice and see which IP's are still here, they are probably watching the stream. You might not expect it, but the tool to do that is named `comm`. It finds the lines that are the same in two files (as opposed to `diff`). The last command line trick up my sleeve is to surround a command with a less then sign and parentheses to get it's output directly into a temp file, like so: <(command) echo <(w) Output: /dev/fd/63 Combining all that into one large line, filtering out everything except for the IP addresses from `ss`, sleeping 5 seconds and then executing the same `ss` command again, into `comm` with a line count at the end results in this line: comm -1 -2 <(ss --all --numeric --no-header state established '( sport = :443 )' | awk '{print $5}' | awk -F: 'NF{--NF};1') <(sleep 5; ss --all --numeric --no-header state established '( sport = :443 )' | awk '{print $5}' | awk -F: 'NF{--NF};1') | wc -l We can put this into a munin plugin (at the end of this page is the code) to graph it automatically every minute, giving us the result over time, here is that graph zoomed in on the stream time: ![stream watchers][1] Also, 9 viewers. If you don't want to use munin for graphs, you can put this on `watch` to get a live count during your stream: watch comm -1 -2 <(ss --all --numeric --no-header state established '( sport = :443 )' | awk '{print $5}' | awk -F: 'NF{--NF};1' ) <(sleep 5; ss --all --numeric --no-header state established '( sport = :443 )' | awk '{print $5}' | awk -F: 'NF{--NF};1' ) | wc -l ### Conclusion Measuring live viewers without sending all their data to Google and thus being very privacy invasive is hard. In this article I've shown you a few ways to get reasonably accurate measurements using tools on the linux command line. All of them agreed on the amount of viewers but do require manual interpretation. Not a bad thing, but not as magical and automatic as just a counter on a web page that handles it for you. ### Munin plugin For your convenience, here is the munin plugin I use for the graphs shown in this article. cat /etc/munin/plugins/hls-long.sh Output: #!/bin/bash # -*- bash -*- : << =cut =head1 NAME HLSStreamViewers - Plugin to measure stream viewers =head1 NOTES Service usage and uptime under your control. =head1 AUTHOR Contributed by Remy van Elst =head1 LICENSE GPLv2 =head1 MAGIC MARKERS #%# family=auto #%# capabilities=autoconf =cut . $MUNIN_LIBDIR/plugins/plugin.sh if [ "$1" = "autoconf" ]; then echo yes exit 0 fi if [ "$1" = "config" ]; then echo 'graph_title Stream Viewers' echo 'graph_args --base 1000 -l 0 ' echo 'graph_scale no' echo 'graph_vlabel HTTP Established Connections > 5 sec' echo 'graph_category stream' echo 'v1.label Connections' exit 0 fi echo "v1.value $(comm -1 -2 <(ss --all --numeric --no-header state established '( sport = :443 )' | awk '{print $5}' | awk -F: 'NF{--NF};1') <(sleep 5; ss --all --numeric --no-header state established '( sport = :443 )' | awk '{print $5}' | awk -F: 'NF{--NF};1') | wc -l)" [1]: /s/inc/img/stream-kijkers.png [2]: /s/inc/img/stream-traffic.png [3]: http://web.archive.org/web/20201124080807/https://support.google.com/YouTube/thread/74512210?hl=en [4]: http://web.archive.org/web/20201013075225/https://support.google.com/YouTube/thread/62717937 [5]: https://datarhei.github.io/restreamer/ [6]: https://github.com/datarhei/restreamer/pull/217 [7]: https://github.com/datarhei/restreamer/issues/100 [8]: https://github.com/playmedia/clappr-ga-events-plugin [9]: /s/inc/img/stream-ga.jpg [10]: /s/snippets/Get_number_of_incoming_connections_on_specific_ports_with_ss.html [11]: /s/inc/img/stream-bw2.png --- License: All the text on this website is free as in freedom unless stated otherwise. This means you can use it in any way you want, you can copy it, change it the way you like and republish it, as long as you release the (modified) content under the same license to give others the same freedoms you've got and place my name and a link to this site with the article as source. This site uses Google Analytics for statistics and Google Adwords for advertisements. You are tracked and Google knows everything about you. Use an adblocker like ublock-origin if you don't want it. All the code on this website is licensed under the GNU GPL v3 license unless already licensed under a license which does not allows this form of licensing or if another license is stated on that page / in that software: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. Just to be clear, the information on this website is for meant for educational purposes and you use it at your own risk. I do not take responsibility if you screw something up. Use common sense, do not 'rm -rf /' as root for example. If you have any questions then do not hesitate to contact me. See https://raymii.org/s/static/About.html for details.