SQLite Extensions, Tools, and Guides from Alex Garcia

Subscribe
Archives
June 13, 2023

Installing SQLite Extensions with pip/npm/gem, sqlite-vss for vector search, and more!

Welcome to the 4th issue of this SQLite newsletter! Apologies for the 10-month long delay since the last issue, but I’ve open sourced many new projects and I’m excited to share with you all!

As a reminder — this is a newsletter created by Alex Garcia about SQLite. New extensions, tools, guides, best practices, and more. There will be new issue roughly every month.

This month: A recap of many new extensions and tooling I have released recently, including:

  • sqlite-ecosystem, a collection of 12 open source SQLite extensions
  • Distributing extensions on pip/npm/gem/deno.land/x for easy installing
  • sqlite-loadable-rs, a framework for writing SQLite extensions in Rust
  • sqlite-vss, an extension for vector search in SQLite

The sqlite-ecosystem with a dozen open-sourced SQLite extensions

The sqlite-ecosystem is a growing collection of 12 open sourced SQLite extensions that all independently provide new SQL scalar function, table functions, and virtual tables in SQLite. These include:

Extension Description Language
sqlite-http Make HTTP requests Go
sqlite-html parse HTML documents Go
sqlite-lines Read files/blob line-by-line C
sqlite-path Parse and generate filepaths C
sqlite-url Parse and generate URLs C
sqlite-xsv Query CSVs Rust
sqlite-regex Regular expression functions Rust
sqlite-ulid Work with ULIDs Rust
sqlite-jsonschema Validate JSON objects with JSON Schema Rust
sqlite-fastrand Generate fast numbers/blobs quickly Rust
sqlite-vss Vector search in SQLite C++

These extensions are written in different languages and exist in different codebases. However, they are all distributed in the same way (all have pip/npm/gem support), and they all have pre-compiled extensions available.

Making Installing + Distributing SQLite Extensions Easier

Historically, installing SQLite extensions has been hard! They’ve been around for 16 years, but were usually copy+pasted C files that you had to compile yourself. Some extensions like spatialite found popularity, but were still difficult to install.

Then sqlean came out in 2021 with a different strategy: pre-compiled SQLite extensions for popular platforms (MacOS/Linux/Windows) built on Github Actions and distributed on Github Releases. Which is awesome! All the SQLite extensions in the `sqlite-ecosystem followed a similar approach.

But it can be even easier! Downloading a pre-compiled extension from a Github Release is easy for one-off scripts, but doing repeatedly in “production” environments can be tedious.

So, I figured out a way to distribute SQLite extension on a few package managers for popular languages, including:

  • Python: pip install‘able SQLite Extensions for Python
  • Datasette: datasette install‘able SQLite Extensions for Datasette
  • Node.js: npm install‘able SQLite Extensions for Node.js
  • Deno: deno.land/x SQLite Extensions for Deno
  • Ruby: gem install‘able SQLite Extensions for Ruby

Each package manager is handled in a different way: For example, the Python packages offer different pre-built wheels for each platform (MacOS/Linux/windows). The Node.js packages use "optionalDependencies", and the Ruby gems have different compiled .gem files for each platform.

sqlite-loadable-rs for writing SQLite extension in Rust

sqlite-loadable-rs is a Rust library that makes writing loadable SQLite extension in Rust much easier. The popular rusqlite crate currently doesn’t allow you to create loadable SQLite extensions, so sqlite-loadable-rs is a hopefully-short-term solution!

This library lead to the following extensions:

  • sqlite-xsv, for reading CSVs files, based on the csv crate
  • sqlite-regex, for working with regular expression , based on the regex crate
  • sqlite-ulid, for generating/parsing ULIDs, based on the ulid crate
  • sqlite-jsonschema, for validating JSON documents, based on the jsonschema crate
  • sqlite-fastrand, for generating random numbers/blobs quickly, based on the fastrand crate

A few reasons why I prefer writing SQLite extension in Rust whenever possible:

  • Memory safety: While not 100% safe, it’s much easier to avoid memory errors with sqlite-loadable-rs than it is writing an extension in C/C++.
  • Rich ecosystem: Most of the extension listed above are just light wrappers around popular and well-tested Rust crates!
  • Performance: In the extensions I’ve made, sqlite-loadable-rs is much faster than extensions written in Go, and either at-par with C extensions or a tad slower. But some extensions, like sqlite-regex and sqlite-xsv, are actually faster than their C counterparts!
  • Supports multiple platforms: For the most part, most Rust crates are compatible with MacOS/Linux/Windows out-of-the-box, which is not the case with many C/C++ libraries.
  • Rust is just nice: The borrow checker, enums, and error handling in Rust just scratches an itch in my brain that I like. It’s not for everyone, and it was incredibly difficult to learn, but I’d much rather write and maintain SQLite extensions written in Rust rather than C, C++, or Go.

For further reading:

  • Introducing sqlite-loadable-rs: A framework for building SQLite Extensions in Rust (December 2022)
  • Introducing sqlite-xsv: The Fastest CSV Parser for SQLite ((January 2023)
  • Introducing sqlite-regex: A fast Regular Expression Extension for SQLite (January 2023)

sqlite-vss for Vector Search in SQLite

sqlite-vss is a SQLite extension for vector search, based on Faiss. Vector search is an integral part of semantic search engines, recommendation systems, Q&A platforms, and more.

“Vector databases” have exploded in popularity recently, fueled by the AI/ChatGPT/embeddings storm of the last several months. Many of these vector databases require a lot of resources: dedicated servers, docker-compose.yaml, complex dependencies, etc.

sqlite-vss, on the other hand, is much more lightweight and simpler than many of these vector databases. Just like SQLite, it’s in-process, requires no extra servers, and saves data inside your SQLite databases.

sqlite-vss works just like the FTS5 full-text search SQLite extension: a vss0 virtual table that you instantiate, insert data with normal INSERT INTO commands, and query it with SELECT statements.

-- 384 == number of dimensions for this example
create virtual table vss_articles using vss0(
  headline_embedding(384),
  description_embedding(384),
);

insert into vss_articles(rowid, headline_embedding)
  select rowid, headline_embedding from articles;

-- KNN-style search: "find the 100 closest vectors to id=123
select 
  rowid, 
  distance
from vss_articles
where vss_search(
  headline_embedding,
  (select headline_embedding from articles where rowid = 123)
)
limit 100;

And just like all the other sqlite-ecosystem extensions, you can pip install, npm install, or gem install this extension into your applications!

sqlite-vss is still young and has a number of disadvantages, but is a great first-choice for many simple AI applications.

See Introducing sqlite-vss: A SQLite Extension for Vector Search (February 2023) for a full introduction and a walk-through demo!

Coming Soon

Bindings for Elixir, Go, and Rust

Similar to the Python/Node.js/Deno bindings, all 12 of the extensions in sqlite-ecosystem will also provide binding for Elixir. That way, Elixir developers can install these extensions with Hex and integrate them into their applications. This effort was spearheaded by Thomas Rodriguez for the experimental sqlite-vss Elixir bindings, and I’ll soon bring them to all the other extensions!

Additionally, bindings for Golang and Rust will come in the next few weeks. These are still a work-in-progress, but you can see examples of then in sqlite-vss for Go and Rust. These bindings are unique - they statically link these extensions into Go and Rust binaries, so there’s no song-and-dance needed to load extensions manually or deal with dynamically-loaded files.

New Extensions!

  • sqlite-arrow: Read parquet/csv/avro/orc into SQLite tables and queries, and write SQLite queries to parquet/arrow IPC
  • sqlite-duckdb: Query and insert data into DuckDB databases
  • sqlite-odbc: Query ODBC-compatible database connections
  • sqlite-img: Query and manipulate images (cropping, thumbnails, rotation, etc.)
  • sqlite-xml: Query XML documents with XPath strings
  • sqlite-assert: Make assertions of your data at query-time

New Tools!

  • sqlite-package-manager: An npm-like tool specifically for managing/installing SQLite extensions
  • sqlite-docs: a CLI tool and extension for documenting SQLite tables, columns, and extension functions
  • sqlitex: A modern sqlite3 alternative, a new CLI with Parquet/S3 support, a bigger stdlib, syntax highlighting, and more

Supporting my Work

If you enjoy my work on SQLite extensions and tooling, or if you company uses my projects often, please consider becoming a sponsor on Github or hiring me for part-time contract development work! These projects takes a lot of my time and energy, so any support is greatly appreciated.

See you next month!

Don't miss what's next. Subscribe to SQLite Extensions, Tools, and Guides from Alex Garcia:
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.