SQLite Extensions, Tools, and Guides from Alex Garcia

Subscribe
Archives
August 5, 2022

sqlite-html: A SQLite Extension for Parsing HTML Documents

Hello all! Welcome to the 2nd issue of this SQLite newsletter. Big thanks to all of you who have subscribed, and I hope you find inspiration in this issue!

As a reminder - this is a newsletter created by Alex Garcia about SQLite - new tools, guides, best practices, and other stuff. There is at most 1 email per week, with many gaps in between.

This week: sqlite-html, an extension for querying, parsing, and generating HTML, all in SQLite!

It's written in Go, made possible with Riyaz Ali's SQLite Go library. The code required to write such extension is very short and easily compiles to a shared library with go build -buildmode=c-shared. Definitely recommend!

sqlite-html

-- Using the SQLite CLI
.load ./html0

select html_extract('<p> Anakin <b>Skywalker</b> </p>', 'b');
-- "<b>Skywalker</b>"

The most useful utility in sqlite-html in my opinion is html_each, which is a table function for querying all elements that match the given CSS selector. Very similar to querySelectorAll() or BeautifulSoup's find_all().

select * from html_each('<ul>
<li>Alpha</li>
<li>Bravo</li>
<li>Charlie</li>
<li>Delta</li>', 'li');

/*
┌──────────────────┬─────────┐
│       html       │  text   │
├──────────────────┼─────────┤
│ <li>Alpha</li>   │ Alpha   │
│ <li>Bravo</li>   │ Bravo   │
│ <li>Charlie</li> │ Charlie │
│ <li>Delta</li>   │ Delta   │
└──────────────────┴─────────┘
*/

There's also some functions for generating HTML safely in SQLite. It uses a seldom-used SQLite feature, subtypes, which previous has only been used on SQLite's JSON functions, to safely escape "untrusted" input.

select html_element('img', json_object('src', './a.png', 'width', 200)); 
-- '<img src="./a.png" width="200.000000"/> '

select html_element('p', null, "text node");
-- '<p>text node</p>'

select html_element('p', null, "<b>Still a text node</b>"); 
-- '<p>&lt;b&gt;Still a text node&lt;/b&gt;</p>'

select html_element('p', null, html('<b>Trusted!</b>')); 
-- '<p><b>Trusted!</b></p>'

Make sure to check out the introductory sqlite-html blog post to learn more, and to see some interactive examples all in the browser!

Other SQLite extensions and news

Simon Willison wrote "Trying out SQLite extensions on macOS", which goes over trying sqlite-lines and sqlite-html on a Mac. There's a couple of hoops to jump through and some pitfalls to avoid, but once it's setup, you have access to a ton of new cool SQLite extensions in your toolbox!

Isaac Brodsky created h3-duckdb, a DuckDB extension for working with H3. These is the first (to my knowledge) 3rd party DuckDB extension that I have seen, which is awesome! DuckDB has several performance and usability improvements over SQLite, but it's still quite young in its development and documentation. It's great to see a growing community around DuckDB extensions, and I hope to port over some of these new SQLite extensions to DuckDB!

Sergey Khabibullin released xlite, a SQLite extensions for querting Excel (.xlsx, .xls, .ods) files. There's also an in-depth blog post detailing how the extension is written in Rust (one of the first Rust SQLite extensions I've seen!).

SQLite version 3.39.2 released this month. The biggest update is support for RIGHT and FULL OUTER JOIN.

Coming Next

That's it for this week! Here's a preview for what's to come in the next few weeks:

  • New SQLite extension for making HTTP requests
  • New SQLite extension for parsing file paths and URLs
  • How to write loadable SQLite extensions in Go
  • New SQLite extensions for querying .xlsx and .xls files

If these emails are getting repetitive, feel free to instead follow me on Twitter or follow the RSS feed for this series!

And as always, if you ever want to chat about SQLite tooling or have questions, feel free to email me, DM me on Twitter, or file an issue in any Github repo!

Don't miss what's next. Subscribe to SQLite Extensions, Tools, and Guides from Alex Garcia:
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.