[HN Gopher] Show HN: Google Drive to SQLite
___________________________________________________________________
 
Show HN: Google Drive to SQLite
 
Here's the repo: https://github.com/simonw/google-drive-to-sqlite
The README is using a trick I'm increasingly leaning on: parts of
that document - the --help output and the example database schema -
are automatically generated using Cog:
https://nedbatchelder.com/code/cog and
https://til.simonwillison.net/python/cog-to-update-help-in-r...
 
Author : simonw
Score  : 195 points
Date   : 2022-02-21 16:12 UTC (6 hours ago)
 
web link (simonwillison.net)
w3m dump (simonwillison.net)
 
| [deleted]
 
| nickcw wrote:
| We discovered (in rclone) that Google are deprecating the auth
| method you are using - the OOB auth here:
| https://github.com/simonw/google-drive-to-sqlite/blob/121509...
| 
| Here is the google page:
| https://developers.googleblog.com/2022/02/making-oauth-flows...
| 
| Here is the discussion on the rclone issue - there are links to
| more stuff there: https://github.com/rclone/rclone/issues/6000
| 
| Luckily rclone already implements the oauth webserver for other
| backends, but for tools like this it has suddenly made
| authenticating with Google a whole lot harder.
 
  | simonw wrote:
  | Hah, wow thanks I hadn't seen that! Just my luck to implement
  | that the same week they announce they are deprecating it.
 
    | simonw wrote:
    | It's pretty hard to understand what I'm meant to do for a CLI
    | app instead here. Are they removing all versions of the "copy
    | and paste this code" auth flow?
    | 
    | Do I need to spin up a localhost web server on a port
    | instead? How am I supposed to do that sensibly on a remote
    | server that I'm SSHing into?
    | 
    | Also that announcement seemed to indicate that localhost:
    | redirects were going to be disabled too, wouldn't that make
    | the local web server option not work either?
    | 
    | I swear, obtaining an OAuth token to interact with my Google
    | account is the new hardest problem in computer science. Oh
    | for a GitHub-style personal access token!
 
      | mey wrote:
      | Considering the GCloud SDK CLI Auth uses the copy/paste
      | this code auth flow for certain scenarios, this will be
      | interesting.
 
        | aaaaaaaaata wrote:
        | They literally couldn't care less.
        | 
        | Either you're a sticky customer and you'll figure it out,
        | 
        | or you're a high value customer that they'll sick an
        | engineer or two on for a week to re-tool.
        | 
        | They do. not. care.
 
      | nickcw wrote:
      | My reading is that a localhost web server on a port is the
      | way to go.
      | 
      | This aligns with what nearly all the other cloud providers
      | do with oauth.
      | 
      | > How am I supposed to do that sensibly on a remote server
      | that I'm SSHing into?
      | 
      | If you want to see rclone's solution then check out
      | 
      | https://rclone.org/remote_setup/
      | 
      | It's a bit of a pain!
      | 
      | > Also that announcement seemed to indicate that localhost:
      | redirects were going to be disabled too, wouldn't that make
      | the local web server option not work either?
      | 
      | As you noted below that is only for Android, IOS and
      | ChromeOS. I hadn't noticed that before though...
      | 
      | > I swear, obtaining an OAuth token to interact with my
      | Google account is the new hardest problem in computer
      | science.
      | 
      | Ha ha!
      | 
      | I note that you can get "rclone authorize" to do it if you
      | want as you can set the scopes and the client id / client
      | secret.
 
      | simonw wrote:
      | I'm collecting notes on how to address this here:
      | https://github.com/simonw/google-drive-to-sqlite/issues/39
      | 
      | It looks like the localhost redirects are being disabled
      | for "iOS, Android and Chrome app OAuth client types" so
      | hopefully I can still use those by spinning up a localhost
      | web server after all.
 
| samhw wrote:
| > The README is using a trick I'm increasingly leaning on: parts
| of that document - the --help output and the example database
| schema - are automatically generated using Cog
| 
| How does this interact with IDEs? Has anyone written a VSCode
| extension yet? I'd probably be reluctant to use something like
| this without IDE support. (Though I may well be in the minority,
| and I appreciate that for open source tools it's not necessarily
| reasonable to expect people to dedicate so much of their time to
| DX.)
 
  | _hl_ wrote:
  | Notable alternative for emacs users: Github supports rendering
  | .org files, so you can have a README.org using babel (just
  | remember to execute all code blocks before pushing).
 
  | simonw wrote:
  | To regenerate the markdown file I run this command in the
  | console:                   cog -r README.md
  | 
  | Presumably most IDEs have q mechanism that can run that command
  | automatically, either when a file changes or just
  | intermittently over time.
  | 
  | I use VS Code's markdown preview panel, which updates
  | automatically when I manually run the "cog -r" command.
 
    | ash wrote:
    | Nice! I didn't know about cog.
    | 
    | I've developed a simple "codeblocks" tool that to keep
    | README.md up-to-date. In your example, here is how to insert
    | the latest `--help` output:                   codeblocks
    | usage-auth README.md -- google-drive-to-sqlite auth --help
    | 
    | Where "usage-auth" is the block language in README.md:
    | Full `--help`:         ```usage-auth         Usage: ...
    | ```
    | 
    | Checking is also possible:                   codeblocks
    | --check usage-auth README.md -- google-drive-to-sqlite auth
    | --help
    | 
    | It's not as generic as cog, but does not require writing
    | codegen scripts inside Markdown file.
    | 
    | https://github.com/shamrin/codeblocks
 
  | whateveracct wrote:
  | hm I'm guessing..run the cog command in the VSCode terminal
 
| NicoJuicy wrote:
| > Pulling the metadata--file names, sizes, file types, file
| owners, creation dates--into a SQLite database felt like a great
| way to start understanding the size and scope of what had been
| collected so far.
| 
| I would:
| 
| - sync the GDrive locally
| 
| - setup "search anything service" from voidtools
| 
| Voila. A super fast programmable index of the files.
| 
| If you want to dump it to SQLite, go ahead.
| 
| Ps. Applicable to Dotnet
| 
| Note: file owners wouldn't be known.
 
  | nijave wrote:
  | If you have a lots of files in Google Drive, this can take a
  | significant amount of time or you may not be able to sync it
  | locally. I push incremental backups & disk images without a
  | retention period to Google Drive (it's unlimited storage) so
  | there's quite a bit of data shoved in there
  | 
  | It's pretty easy to get rate limited so it could take days or
  | weeks to build an index (so having a tool that talks to the API
  | is generally more reliable)
 
  | simonw wrote:
  | In my case the Google Drive folder that was shared with me has
  | about half a TB in it, so going via the API seemed like a
  | better option!
 
    | NicoJuicy wrote:
    | Agreed :p
 
| gwbas1c wrote:
| FWIW: If you find that you need something like this, consider
| using a different file storage product. Most likely, it's using
| SQLite and the schema is useful.
| 
| I used to be the lead on Syncplicity, a desktop file
| synchronization product. All of the local state was in SQLite.
| Early versions of the desktop client kept the complete path to
| every file in the associated row in SQLite. Later versions kept
| the filename and foreign key to the directory row.
| 
| Bittorrent Sync (I forget its new name) also used to use a SQLite
| database for local metadata.
| 
| And, finally: It's not that hard to write a tool to recursively
| scan a folder and grab all the metadata of files. You can scan
| 1000s of files in a matter of seconds.
 
| anonymouse008 wrote:
| TELL HN: More things like this please -- though I have no real
| use for this tool directly, it's paradigms are super helpful in
| thinking through other projects I'm working on.
| 
| Thanks simonw - neat stuff
 
  | [deleted]
 
  | mixcocam wrote:
  | Agreed. More uses of SQLite too!
  | 
  | I have been working with it to gather ecom sales data across
  | platforms for a couple of months. Such a powerful tool! I want
  | to do more with it!
 
| rbosinger wrote:
| I dived into Datasette a bit last year. Simon has a whole set of
| tools around SQLite and data analysis with it. I haven't used any
| of it for anything major but I love to see what he's got going
| on. My Google Drive is out of control, I hope I can find the time
| to try this one out.
 
| tonymet wrote:
| An Alfred workflow that queries this index would be fantastic.
| Alternatively feed it into spotlight
 
  | bjtitus wrote:
  | There's a Google Drive workflow for Alfred which I've used
  | previously.
  | 
  | https://www.alfredforum.com/topic/17318-google-drive-%E2%80%...
 
___________________________________________________________________
(page generated 2022-02-21 23:00 UTC)