|
| cosmic_quanta wrote:
| In the same vague theme of "I don't know what I'm dealing with" :
| https://github.com/ajalt/fuckitpy
| kilnr wrote:
| Another one sort of related is hachoir, and specifically the
| hachoir-metadata script: https://github.com/vstinner/hachoir
| 0-_-0 wrote:
| I like the Versioning section:
|
| _The web devs tell me that fuckit 's versioning scheme is
| confusing, and that I should use "Semitic Versioning" instead.
| So starting with fuckit version h.g., package versions will use
| Hebrew Numerals._
| antongribok wrote:
| I can't decide what I'm more impressed with:
|
| The 110% code coverage, the downloads per month, or the
| license.
| bee_rider wrote:
| I'm not sure if it was intentional or not, but I love that
| the Hebrew characters that they found look visually similar
| to Nan.
| dec0dedab0de wrote:
| At first I thought this was going to be like google lens. It's
| instead a way to probabilistically Identify things in strings. I
| have wished for this to exist, and made my own dumbed down
| version of it before. This could be very useful for less fragile
| screen scraping.
| acidbaseextract wrote:
| Some more great probabilistic python libraries:
|
| https://github.com/datamade/usaddress - "usaddress is a Python
| library for parsing unstructured address strings into address
| components, using advanced NLP methods."
|
| https://github.com/datamade/probablepeople - "probablepeople is a
| python library for parsing unstructured romanized name or company
| strings into components, using advanced NLP methods."
| nerdponx wrote:
| I have used and benefited tremendously from both of these
| libraries. While the methods are sound, the training data they
| used is not that comprehensive. He will probably want to apply
| some heuristic clean up before and after processing. Or if your
| organization has a lot of time and money, add additional
| training data.
| cge wrote:
| Note that for the usaddress library, as I was surprised that it
| failed spectacularly when I played with it: the 'us' in the
| name appears to refer to the US, not 'unstructured'. There's no
| note of this in the readme, though there is a small US flag
| emoji in the Github about string.
| ssivark wrote:
| Nice! In the same spirit, here's an interesting talk on using
| Gen.jl (a probabilistic programming library/framework) for
| cleaning messy data in tables: https://youtu.be/vUxrtqY84AM
| ok123456 wrote:
| https://github.com/chardet/chardet - Detects the most likely
| encoding of a raw byte string.
| lapp0 wrote:
| Why would I need this when I already have a full Tome of Identify
| with 50 charges?
| nknealk wrote:
| Tome of identify only holds 20 charges
| AbraKdabra wrote:
| I'm pretty sure he's playing the Project Diablo II mod.
| saas_sam wrote:
| PyWhat only uses one inventory slot vs. 2 for Tome. That's one
| extra SoJ!
| lettergram wrote:
| We built a similar tool, utilizing a CNN. It works on structured
| (and unstructured) data and provides additional info.
|
| https://github.com/capitalone/DataProfiler
|
| Cool part, is you can "extend" the intern name-entity recognition
| model by refitting with the new data.
|
| Out if the box, the DataProfiler does something like 18 entities
| including most of the PII dada.
| [deleted]
| gigatexal wrote:
| There really is a Python module for everything.
| cecilpl2 wrote:
| Cool, but it seems like 80% of the results in your example demos
| are Youtube video IDs.
| Mogzol wrote:
| I find it kind of funny that they would choose to show those as
| demos when it's obvious that most of them really aren't Youtube
| video IDs. Like "Accept-Lang" is pretty obviously not actually
| a video ID, even if it matches the [A-Za-z0-9_-]{11} pattern
| and technically could be a valid ID.
|
| On the other hand, I don't know how you would actually verify
| whether an 11-character string is or isn't a Youtube ID (short
| of querying Youtube itself), so I suppose it's nice that
| potential IDs are shown, just seems they have a very high
| chance of being false positives.
| meowface wrote:
| You can reduce false positives by trying to identify
| base64-seeming strings that are 11 characters long. Above a
| certain amount of entropy and uppercase/lowercase/digit
| distribution, etc. You might risk false negatives, but
| different flags for different levels of sensitivity could
| help with that.
| vitus wrote:
| I'm admittedly not impressed by the pcap processing.
|
| It identifies a bunch of fragments of HTTP headers as "YouTube
| Video ID".
|
| Meanwhile, I can get the same info and more by running
| $ strings FollowTheLeader.pcap *]?> GET /
| HTTP/1.1 Host: 10.0.2.5 User-Agent: Mozilla/5.0
| (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
| Accept:
| text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
| Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip,
| deflate Connection: keep-alive Upgrade-Insecure-
| Requests: 1 Pragma: no-cache Cache-Control: no-
| cache HTTP/1.0 200 OK Server: SimpleHTTP/0.6
| Python/3.7.3rc1 Date: Sun, 14 Jul 2019 02:42:13 GMT
| Content-type: text/html Content-Length: 105 Last-
| Modified: Sun, 14 Jul 2019 02:41:10 GMT My Flag Web
| Page Hi there! Have a flag! Here
| is your flag: ctfa{terrific_traffic}
___________________________________________________________________
(page generated 2021-06-16 23:00 UTC) |