Introducing Django Contributions Weekly
This week we introduce Django Contributions Weekly, a new weekly report inspired by the "Updates to Django" section of Django News. In addition, there are some fun facts about the data in Trac. We also share some topics on vector search.
Events
- Call for proposal for PyLadies Con 2025 is open now until August 8th.
- DjangoCon Africa 2025 takes place Aug 11-15 in Arusha, Tanzania. Less than 2 weeks away!
- DjangoCon US 2025 announces the first Keynote Speaker, Carson Gross! The conference takes place Sep 8-12 in Chicago. Less than 6 weeks away!
Django Contributions Weekly - Y25 W30 - Jul 21 to Jul 27
Django Contributions leverages data in Trac. While Django News highlights the GitHub PR authors and PRs merged, Django Contributions dives into the queues, components, and other areas of contributions.
The goal is to provide recognition to more people and to better understand the health of the community. The report is also posted on https://django-contributor.ontowhee.com/trac/reports. It is currently experimental and actively tweaked.
Overview
There was a total of 55 modified tickets last week. 39 of those are open, and 16 of those are closed.
Components
- contrib.admin, 13
- Database layer (models, ORM), 8
- contrib.staticfiles, 8
- Core (Mail), 5
- Utilities, 3
- contrib.auth, 3
- Uncategorized, 2
- HTTP handling, 2
- Template system, 2
- Migrations, 2
- Documentation, 2
- Core (Other), 1
- GIS, 1
- Core (URLs), 1
- Internationalization, 1
End Of Week Queues
- Unreviewed, 8
- Needs PR, 11
- Needs Review, 7
- Waiting On Author, 13
- Ready for checkin, 3
- Closed (fixed), 10
- Closed (other), 6
- Someday/Maybe, 2
Activities
- Assigned: 10
- Reviewed: 13
- Coded: 16
- Discussions: 80
Turnaround Times For Active Work
Note: This is not the overall turnaround times. It does not include tickets or past activities outside of the past week.
- Waiting On Author -> Needs Review:
- Avg: 25h 24m
- Med: 10h 10m
- Max: 45h 22m
- Min: 3h 43m
- Needs Review -> Waiting On Author:
- Avg: 9h 29m
- Med: 10h 10m
- Max: 18h 23m
- Min 3h 59m
Many thanks to the contributors last week!
- Federico Capoano (nemesifier), 1
- Klaas van Schelven (vanschelven), 1
- Olivier Dalang (olivierdalang), 1
- Mridul (mriduldhall), 1
- Farhan Ali (FarhanAliRaza), 1
- SAVAN SONI (savansonii), 1
- Dorian Adams (dorian-adams), 1
- Gagan Deep (pandafy), 1
- Ruhm (Ruhm42), 1
- Frederich Pedersen (Frodothedwarf), 1
- mrartem1927 (mrartem1927), 1
- Forrest Roudebush (forrrestr), 1
- Anže Pečar (anze3db), 1
- Brian Helba (brianhelba), 1
- Lily Acorn (LilyAcorn), 1
- Roelzkie (roelzkie15), 1
- Jacob Walls (jacobtylerwalls), 2
- Jake Howard (RealOrangeOne), 2
- Carlton Gibson (carltongibson), 2
- David Smith (smithdc1), 2
- Mariusz Felisiak (felixxm), 3
- Chaitanya (XChaitanyaX), 2
- Simon Charette (charettes), 2
- Clifford Gama (cliff688), 3
- Jason Hall (Jkhall81), 3
- Mike Edmunds (medmunds), 5
- Antoliny (Antoliny0919), 6
- blighj (blighj), 7
- Natalia Bidart (nessita), 10
- Sarah Boyce (sarahboyce), 30
Fun Facts From Trac Data
The data in Trac as of late May 2025 consists of:
- 35,641 tickets.
- 211,414 comments across all tickets.
- 458,904 activities/modifications across all tickets (including comments).
This averages out to about 6 comments per ticket and 12 activities per ticket.
Taking a closer look at the number of activities per ticket, we see:
- The max number of activities on any single ticket is 312. This occurs on ticket 2070.
- The min number of activities on any single ticket is 0. This occurs on 7 tickets (1671, 1721, 23054, 24246, 24312, 31252, 35684). These were mainly created by core contributors who created the tickets that bypassed the Unreviewed stage when the ticket was created.
- The median number of activities on any single ticket is 4. This occurs on 3,505 tickets.
- The mode for activities is 1. There are 6,068 tickets with 1 activity. This is the most common number of activities on tickets.
There are also 4 tickets that do not exist in the ticket
table but have residual data in the ticket_change
table. They were determined to be spam and invalid.
SQL queries used to generate the stats.
-- Count tickets.
select count(*) from ticket;
-- Count comments. Does not include modifications to comments, which have field values in the format '_comment<number>'.
select count(*) from ticket_change where field = 'comment';
-- Count activities.
select count(*) from ticket_change;
-- Max activities.
select ticket, count(distinct time) as activities
from ticket_change
group by ticket
order by activities desc limit 1;
-- Min activities. Find tickets that have 0 entries in ticket_change.
select id from ticket
where id not in (
select distinct(ticket) from ticket_change
);
-- Median activities. Use offset and limit to scan the nearby results for consistency.
select ticket, count(distinct time) as activities
from ticket_change
group by ticket
order by activities
offset (35638 / 2) - 10 limit 20;
-- Mode activities.
select activities, count(*) cnt
from
(
select ticket, count(distinct time) as activities
from ticket_change
group by ticket
) t1
group by activities
order by cnt desc, activities desc limit 1
-- Non-existent tickets with residual data in ticket_change table.
select * from ticket_change
where ticket not in (select id from ticket)
order by ticket, time;
Vector Search
I got acquainted with some new terminologies and tools. Sentence transformers are LLM models that convert a string of text into a vector, aka embedding, aka a list of numbers. On Hugging Face, a common general purpose model to use is sentence-transformers/all-MiniLM-L6-v2. There are other models for other use cases, such as ones targeted for different languages, optimized for accuracy or for performance, tailored towards understanding questions or producing summaries, and more.
Other new topics that came up along the way include indexes for vector search. One of them is HNSW, Hierarchical Navigable Small World index. It leverages the concept of a skip list to help narrow down the data set, which allows the search to be fast. Another index is IVFFlat, Inverted File Flat index. Both are implemented in the PostgreSQL extension pgvector.