DLP Dispatch #5
Welcome to the fifth edition of the Data Liberation Project’s newsletter. Inside: Two major datasets liberated, FMCS database documentation, the latest batch of FOIA requests, and a volunteering form.
Data, liberated: Facilities handling hazardous substances
The US Environmental Protection Agency’s Risk Management Program rule requires facilities handling “certain hazardous substances” to submit “risk management plans” at least every five years, detailing the chemicals they use, the risks that usage poses, what the facility is doing to minimize accidents, and a five-year accident history.
The EPA collects these filings into a central database, which I've mentioned in past dispatches: after the DLP filed a FOIA request for it ... after the request was granted ... and after I received the CD containing the data.
Since then, DLP volunteers and I have worked to understand, document, and process the (fairly complex) dataset. There’s still more work to be done, but we’ve gotten to the point where (I think) we have a decent handle on the data. So, I’m happy to say: it’s now yours to download and explore.
The best place to start is the main documentation. There you’ll find essential context, as well as links to:
- The raw records received via FOIA.
- Those records converted to SQLite database files.
- A basic data browser for exploring individual facilities and their submissions.
- A set of simple spreadsheets providing an overview of each facility, submission, and reported accident.
I’m eager to help you work with the data: To help you use it to you inform your communities, to help you analyze it, to help you build public-interest tools with it, and more. (To get that help, all you need to do is reply to this email.)
Many thanks to Derek Willis for the idea to request the data, and to volunteers Arianna Cabrera, Evelyn Martin, and Alexis Raykhel for helping to document and dig into it.
More data, liberated: Animal Welfare Act inspections
The Animal Welfare Act sets minimum standards of animal care by four main types of licensees: commercial animal dealers, exhibitors (such as zoos), research facilities, and transporters. The USDA’s Animal and Plant Health Inspection Service (APHIS) checks whether licensees meet those standards, and issues citations when they do not.
The agency provides an online portal containing its inspections but, frustratingly, no option to download the full dataset. The structured information provided through the interface also lacks important details, such as the type of inspection and the list of species inspected — information that is available only in the inspection report PDFs.
In the very first DLP Dispatch, I raised the prospect of scraping the APHIS portal to liberate that data. As it turns out, Big Local News’s Ben Welsh was also working on an APHIS scraper. A mutual friend connected the dots a couple of months ago. Since then, Ben and I have been collaborating on code to fetch the 80,000+ published inspection reports, parse the PDFs, and make the records all-around more useful.
Although there’s still (and always) more work to do, we think we now have something useful to share. So, yesterday, we published our APHIS-scraping code and the data it has gathered. Among the main resources you’ll find there (and also automatically updated in the biglocalnews.org portal):
- A CSV spreadsheet of every inspection (date, entity, inspection type, violation counts, license number, license type) going back to 2014.
- Another CSV listing all species-level counts of the animals inspected.
- An RSS feed listing the inspections we’ve most recently discovered.
We’ve also uploaded all the inspection report PDFs to a public, searchable project on DocumentCloud.
We’re eager to see what you do with these records, and we’re eager to help you use them. If you have any questions or feedback, don’t hesitate to get in touch.
Documentation, liberated: FMCS’ case management system
In December, I submitted a FOIA request to the Federal Mediation and Conciliation Service (FMCS) seeking:
- Database records describing the general characteristics of all FMCS-tracked work stoppages.
- All documentation of FMCS-0004, the “system of records” containing that information.
Earlier this month, the agency’s FOIA office responded with a “partial grant,” providing just one documentation file, and withholding all other records.
The DLP intends to appeal the decision. In the meantime, I’ve uploaded the one file the agency did provide: An Excel spreadsheet named “Prod Schema.xlsx”. Although FMCS did not provide any context for the file when providing it, it appears to be the database schema for the agency’s case management system, or at least a portion of it. It’s fairly detailed, with 5,900+ columns across 100+ tables. I’m hoping it can be useful to the public in (a) understanding how FMCS tracks its cases, and (b) filing high-precision FOIA requests for the system’s records.
The latest batch of FOIA requests
Since the last dispatch, the Data Liberation Project filed eight new requests. They seek:
-
Data from a recent CMS program “providing eligible hospitals with unprecedented regulatory flexibilities to treat eligible patients in their homes” (co-requested with Maddy Varner of The Markup).
-
A copy of the “comprehensive national water use inventory” mandated by Secure Water Act of 2009 (also co-requested with Varner).
-
The Coast Guard’s national database of recreational boating accidents.
-
A national database of “fatalities and vessel disasters that occur in the US fishing industry”.
-
Documentation of the Consumer Financial Protection Bureau’s enforcement database.
-
Department of Agriculture records describing its inventories of data and “major information systems”.
You can read more about each request via the links above. If you have any questions about them, please do ask.
A volunteering form
On the DLP website’s Get Involved page, I’ve linked to a form where you can express your interest in volunteering. It’s provides a bit more structure and information than just emailing (which you're still welcome to do).
That’s all for now! Thank you for reading, and don’t hesitate to reply.
— Jeremy