ELT with dbt and Airflow
#004 — AUGUST 27, 2025
This week, I continue to explore database topics and added two new tools to my data engineering toolkit: dbt and Airflow.
For the next two weeks, I'll be taking a break from the newsletter. I'll continue working on projects, and I'll also be attending DjangoCon US. I'll be back on Sep 17!
Django Contributions Weekly - Y25 W34 - Aug 18 - 24
There were a total of 52 modified tickets last week. By the end of the week, 27 of those were open, 23 were closed, and 2 were someday/maybe.
Components
Number of modified tickets by component.
- Database layer (models, ORM): 12
- contrib.admin: 9
- contrib.staticfiles: 5
- contrib.auth: 4
- GIS: 4
- HTTP handling: 4
- Documentation: 3
- Forms: 2
- Template system: 2
- Utilities: 2
- Uncategorized: 2
- Packaging: 1
- Core (Other): 1
- Migrations: 1
Activities
Count of activity changes to the ticket statuses last week.
- assigned: 10
- reviewed: 12
- coded: 21
- discussion: 59
Queue Status
Where the modified tickets landed by the end of the week.
- Unreviewed: 2
- Needs PR: 4
- Needs Review: 10
- Waiting On Author: 9
- Ready for checkin: 2
- Closed (fixed): 16
- Closed (other): 7
- Someday/Maybe: 2
Contributors
There were 26 contributors last week!
- Farhan Ali (FarhanAliRaza), 1
- Kamil Paduszyński (paduszyk), 1
- Jason Hall (Jkhall81), 1
- David Sanders (shangxiao), 1
- Rob Hudson (robhudson), 1
- Mariusz Felisiak (felisiak.mariusz), 1
- mengxun (mengxunQAQ), 1
- Jagadesha NH (jagadeshanh), 1
- Mustafa Pirbhai (mspirbhai), 1
- Alex (alexgmin), 1
- Xdynix (Xdynix), 1
- David Smith (smithdc1), 1
- Tim Graham (timgraham), 1
- Márton Salomváry (salomvary), 1
- Artyom Kotovskiy (mrartem1927), 1
- Jake Howard (RealOrangeOne), 2
- Carlton Gibson (carltongibson), 2
- Jacob Walls (jacobtylerwalls), 2
- Take Weiland (diesieben07), 2
- Simon Charette (charettes), 3
- Clifford Gama (cliff688), 3
- Antoliny (Antoliny0919), 4
- blighj (blighj), 4
- JaeHyuckSa (JaeHyuckSa), 7
- Natalia Bidart (nessita), 9
- Sarah Boyce (sarahboyce), 26
"Vector Search Isn't The Answer To Everything. So What Is? A Technical Deep Dive." - by Jacky Liang, TigerData
Very interesting read on the appropriate application of Vector Search. However, the article isn't suggesting an lighter weight solution to vector search. Rather, it is suggesting a hybrid approach that combines several search methods.
A question on my mind is, what is good for Django's documentation? Does it require an AI solution, or is the pain in consolidating all the resources? I think consolidating resources is a required first step regardless of AI or not.
PostgreSQL Patch 5660 - SQL:2011 application time - by Paul Jungwirth
Application time implementation for PostgreSQL, which was committed recently and will be available in PG 19. This introduces the keyword PERIOD
, which enforces temporal foreign key constraints on a column with a range data type. Read the email threads, SQL:2011 PERIODS vs Postgres Ranges? and SQL:2011 application time, for the discussions behind it all. I think it's amazing how this feature took 7 years to get committed!
ELT
I was hitting a roadblock as I worked through the data in Trac. I was writing several CTEs, and I needed to write more, but it was becoming unmanageable. I looked for resources, and landed on two tools for ELT (extract, load, transform):
- airflow - An orchestration tool that allows you to build pipelines of tasks in python and run them on a schedule.
- dbt - A tool for managing sql files that leverages jinja templates to reuse common parts of SQL. Often integrated with airflow.
I worked through the example projects from Data Engineering Course for Beginners, a freeCodeCamp course created by an employee of Airbyte, but skipped the Airbyte portion. I just wanted to develop locally and prove a concept. While there is a lot of setup involved in this tutorial, and it took me double the time of the video to get my project to work, I found it helpful to understand how all the tools are integrated. I'm now ready to adapt this to Trac's data.
Anti-join
If you've skimmed through tickets on Django, or you've written interesting queries, you might have come across the term "anti-join". What does it mean? Anti-join is a join between two tables where you are looking for rows that are in one table but not in the other table. Very often, an anti-join occurs when there is a NOT IN or NOT EXIST predicate in the query. The topic usually comes up when there is a performance issue, because testing that something is not in a set of values can be very costly.
Jekyll And Build Pages
Last week I wrote about setting my website on Codeberg. I use jekyll to build the static pages. I used to host it on GitHub, and GitHub would handle generating the static files and deploying them.
With Codeberg, you have to build the static pages yourself. To streamline the process, I have two repositories: one for the jekyll project, and one for the static files that get generated. To make it easy to "deploy", I updated the _config.yml
file for the jekyll website to specify that the build folder is the ../pages
folder.
I had forgotten about this setup, even though it has been just about a week ago since I made this configuration, and I found myself panicking when I wanted to update my website. Then I checked the timestamps on the files in the pages repository, and realized everything was working just fine.