TITLE: Scraping instagram without an account DATE: 2019-10-20 AUTHOR: John L. Godlee ==================================================================== There are lots of people I would like to follow on Instagram, mostly woodworkers, bicycle people, and outdoors people. It seems to be a really good method of delivering content. Unfortunately for Instagram, there is absolutely no way I would make an account with them. I fear it would be too much of a time sink, and I’m paranoid of giving too much detail of my personal interests to Facebook. I found a command line tool called [InstaLooter] which you can use to scrape public Instagram profiles without an account and save the images on my local machine which I can then read at my leisure, in the spirit of RSS. This is how I implemented the program. [InstaLooter]: https://github.com/althonos/InstaLooter I created a text file which lives in my $HOME called .ig_subs.txt. The file holds a list of Instagram user IDs for the accounts I want to scrape from: kelsoparadiso lloyd.kahn exploringalternatives barnthespoon terrybarentsen woodlands.co.uk zedoutdoors mossy_bottom Then I made a shell script which lives in my path, called insta_dl: #!/bin/bash # Make directory if it doesn't exist mkdir -p $HOME/Downloads/ig # make newlines the only separator IFS=$'\n' # disable globbing set -f # Loop for i in $(cat < "$HOME/.ig_subs.txt"); do instalooter user $i $HOME/Downloads/ig/ -n 1 -N -T {username}.{date}.{id} done instalooter user $i downloads photos from each user i. -n 1 only downloads the most recent post, whether that post is one photo or multiple. -N only downloads images which don’t already exist in the destination directory ($HOME/Downloads/ig/), based on the filename. -T {username}.{date}.{id} sets the filename of each photo. {id} is unique for each photo on Instagram, so it uniquely identifies each file downloaded for use by -N. The filenames then look something like this: exploringalternatives.2019-09-27.2142383070393557093.jpg kelsoparadiso.2019-10-09.2150831532411304437.jpg kelsoparadiso.2019-10-09.2150831532419588103.jpg kelsoparadiso.2019-10-09.2150831532419839765.jpg lloyd.kahn.2019-10-11.2152638264107259024.jpg mossy_bottom.2019-10-09.2151026330651686709.jpg terrybarentsen.2019-10-03.2146722625883638769.jpg terrybarentsen.2019-10-03.2146722625900303797.jpg terrybarentsen.2019-10-03.2146722625950630270.jpg woodlands.co.uk.2019-10-11.2152273592812162360.jpg zedoutdoors.2019-10-02.2145942922787735607.jpg If I wanted to I guess I could further file each image into its own directory based on username or date, but I don’t want that. I can now create a cronjob or a LaunchAgents script to automate this to run everyday or every week in the background. Update - 2019_10_31 I updated the insta_dl shell script so that it also grabs the caption of each instagram post downloaded and stores it in a text file. InstaLooter can download post metadata as a JSON file by adding the -d flag (--dump-json). Then I use jq to parse the JSON file for each post to extract the full name of the account (.owner.full_name), the @username of the account (.owner.username) and the content of the caption of the post (.edge_media_to_caption[][].text). Then I use sed to put a blank line between each caption to make it easier to read and delete the original JSON files: #!/bin/bash # Make directory if it doesn't exist mkdir -p $HOME/Downloads/ig DIR=$HOME/Downloads/ig # make newlines the only separator IFS=$'\n' # Loop for i in $(cat < "$HOME/.ig_subs.txt"); do instalooter user $i $DIR -v -d -n 1 -N -T {username}.{date}.{id} done for i in $DIR/*json ; do cat $i | jq '(.owner.full_name + " (" + .owner.username + "): " + .edge_media_to_caption[][].text)' done > $DIR/description.txt sed -i 'G' $DIR/description.txt rm $HOME/Downloads/ig/*.json