[HN Gopher] Tabula - Extract tables from PDF files
___________________________________________________________________
 
Tabula - Extract tables from PDF files
 
Author : pabs3
Score  : 47 points
Date   : 2021-06-07 05:46 UTC (17 hours ago)
 
web link (github.com)
w3m dump (github.com)
 
| gitowiec wrote:
| Camelot and it's Excalibur are great too! We used it to convert
| different bank statements
 
| petalmind wrote:
| You may also find useful: https://github.com/adworse/iguvium
 
| tayloramurphy wrote:
| I used Tabula quite a bit at the startup I used to work at (> 3
| yrs ago). We were curating and organizing genetic testing
| information and much of the data was sent to us with PDFs.
| 
| It didn't work everytime, but when it did, it was awesome!
 
| geonic wrote:
| It's a nice tool. Found it by chance a couple of days ago. It did
| save me a lot of typing.
 
| smt88 wrote:
| We tried this and a few other tools, but we ended up with
| PDF2XL[1] (which works on everything, not just tables).
| 
| It's pretty ugly and not cheap, but the data extraction is
| absolutely _magical_.
| 
| I very rarely feel joy and excitement when using a tool,
| especially a PDF-related tool, but it saved our dev team at least
| 100 hours when we first used it. We have it as an automated part
| of one of our client flows and they happily pay us way more than
| they should.
| 
| 1. https://pdf2xl.com/
 
___________________________________________________________________
(page generated 2021-06-07 23:00 UTC)