https://github.com/mgdm/htmlq

Skip to content
 
Sign up

  * Why GitHub?
    Features -
      + Mobile -
      + Actions -
      + Codespaces -
      + Packages -
      + Security -
      + Code review -
      + Issues -
      + Integrations -
      + GitHub Sponsors -
      + Customer stories-
  * Team
  * Enterprise
  * Explore
      + Explore GitHub -

    Learn and contribute

      + Topics -
      + Collections -
      + Trending -
      + Learning Lab -
      + Open source guides -

    Connect with others

      + The ReadME Project -
      + Events -
      + Community forum -
      + GitHub Education -
      + GitHub Stars program -
  * Marketplace
  * Pricing
    Plans -
      + Compare plans -
      + Contact Sales -
      + Education -

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}

mgdm / htmlq

  * Notifications
  * Star 2.1k
  * Fork 29

Like jq, but for HTML.

MIT License
2.1k stars 29 forks
Star
Notifications

  * Code
  * Issues 5
  * Pull requests 2
  * Actions
  * Projects 0
  * Wiki
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Wiki
  * Security
  * Insights

master
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags
1 branch 1 tag
Code

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/m]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone mgdm/h]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Go back

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Go back

Launching Xcode

If nothing happens, download Xcode and try again.

Go back

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@mgdm
mgdm Bump version to 0.2.0 due to default selector change
...
5cae41f Sep 7, 2021
Bump version to 0.2.0 due to default selector change
5cae41f

Git stats

  * 22 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.github/workflows
add a binary build github workflow (#7)
Sep 7, 2021
src
Bump version to 0.2.0 due to default selector change
Sep 7, 2021
.gitignore
Initial commit
May 7, 2019
Cargo.lock
Bump version to 0.2.0 due to default selector change
Sep 7, 2021
Cargo.toml
Bump version to 0.2.0 due to default selector change
Sep 7, 2021
LICENSE.md
Create LICENSE.md
May 7, 2019
README.md
Bump version to 0.2.0 due to default selector change
Sep 7, 2021
View code
[                    ]
htmlq Installation Usage Examples Using with cURL to find part of a
page by ID Find all the links in a page Get the text content of a
post Pretty print HTML

README.md

 htmlq

Like jq, but for HTML. Uses CSS selectors to extract bits of content
from HTML files. Mozilla's MDN has a good reference for CSS selector
syntax.

 Installation

cargo install htmlq

 Usage

$ htmlq -h
htmlq 0.2.0
Runs CSS selectors on HTML

USAGE:
    htmlq [FLAGS] [OPTIONS] <selector>...

FLAGS:
    -h, --help                 Prints help information
    -w, --ignore-whitespace    When printing text nodes, ignore those that consist entirely of whitespace
    -p, --pretty               Pretty-print the serialised output
    -t, --text                 Output only the contents of text nodes inside selected elements
    -V, --version              Prints version information

OPTIONS:
    -a, --attribute <attribute>    Only return this attribute (if present) from selected elements
    -f, --filename <FILE>          The input file. Defaults to stdin
    -o, --output <FILE>            The output file. Defaults to stdout

ARGS:
    <selector>...    The CSS expression to select
$

 Examples

 Using with cURL to find part of a page by ID

$ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'
<div class="four columns mt3 mt0-l" id="get-help">
        <h4>Get help!</h4>
        <ul>
          <li><a href="https://doc.rust-lang.org">Documentation</a></li>
          <li><a href="https://users.rust-lang.org">Ask a Question on the Users Forum</a></li>
          <li><a href="http://ping.rust-lang.org">Check Website Status</a></li>
        </ul>
        <div class="languages">
            <label class="hidden" for="language-footer">Language</label>
            <select id="language-footer">
                <option title="English (US)" value="en-US">English (en-US)</option>
<option title="French" value="fr">Francais (fr)</option>
<option title="German" value="de">Deutsch (de)</option>

            </select>
        </div>
      </div>

 Find all the links in a page

$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.org/
/learn/get-started
https://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
$

 Get the text content of a post

$ curl --silent https://nixos.org/nixos/about.html | htmlq  --text .main

          About NixOS

NixOS is a GNU/Linux distribution that aims to
improve the state of the art in system configuration management.  In
existing distributions, actions such as upgrades are dangerous:
upgrading a package can cause other packages to break, upgrading an
entire system is much less reliable than reinstalling from scratch,
you can't safely test what the results of a configuration change will
be, you cannot easily undo changes to the system, and so on.  We want
to change that.  NixOS has many innovative features:

[...]

 Pretty print HTML

(This is a bit of a work in progress)

$ curl --silent https://mgdm.net | htmlq --pretty '#posts'
<section id="posts">
  <h2>I write about...
  </h2>
  <ul class="post-list">
    <li>
      <time datetime="2019-04-29 00:%i:1556496000" pubdate="">
        29/04/2019</time><a href="/weblog/nettop/">
        <h3>Debugging network connections on macOS with nettop
        </h3></a>
      <p>Using nettop to find out what network connections a program is trying to make.
      </p>
    </li>
[...]

About

Like jq, but for HTML.

Resources

Readme

License

MIT License

Releases

1 tags

Packages 0

No packages published

Contributors 7

  * @mgdm
  * @bbcmgdm
  * @chrisdickinson
  * @stuartlangridge
  * @heyitsols
  * @simonsan
  * @carnott-snap

Languages

  * Rust 100.0%

  * (c) 2021 GitHub, Inc.
  * Terms
  * Privacy
  * Security
  * Status
  * Docs

 

  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.