For open data geeks, there is much to be happy about in Chicago municipal government. John Tolva, a longtime open data advocate and city hacker, is now the Chief Technology Officer. Brett Goldstein, the early OpenTable technologist who most recently founded the predictive analytics practice at the Chicago Police Department, is the Chief Data Officer. For people like me, this is a dream come true. Seriously.
A few weeks have passed since their appointment by Mayor Emanuel, and I want to start reviewing the concrete steps they've taken, especially in the raw data they've been publishing to the City's data repository at data.cityofchicago.org. Many of us in the open data movement have been arguing loudly for more raw data. The time for arguing is over. The time for making is now.
Let's start with a high-value data set with which I am intimately familiar-- payments to City contractors.
The payment data available in the payments section of Chicago's data repositorty goes back to 1996, and the data on the City Web site lookup tool goes back to 1993. Keep in mind that none of this data covers the Chicago Park District, the Chicago Housing Authority, or the Chicago Transit Authority. As our friends at the Better Government Association have recently shown, there might be some issues over there.
It is impossible to create great applications using civic data without first attempting to understand how that data came to be. It's popular among developers to make fun of difficult-to-navigate municipal Web sites, but I take a more grateful approach. Think of it this way-- for more than 15 years, muncipal technology workers have been feeding and caring for this database of fresh, reliably formatted information. I appreciate the time, energy, hiring, skill, storage, and sheer electricity it took to get that done.
Here is a 1998 white paper, "Reengineering the purchasing function: identifying best practices for the City of Chicago" on the creation of FMPS (pdf). The author of the paper is Kathryn M. Kustermann, who appears to have been a Deputy Chief Information Officer at the time FMPS was designed. Reading this document, it becomes clear that a main goal of the system was to "redesign the core purchasing processes at the city of Chicago", and it achieved that goal.
One objective was to "us[e] the system to quickly and easily send and receive information through fax server and e-mail technology, eliminating a majority of manual effort and lost documents".
So this is where their heads were at-- getting the City's purchasing system current to technology that was already a few years past its wide acceptance. That's how these things go, and I don't see anything wrong with it. I'd rather my city work more slowly and deliberately than the private sector does. There is a role for everyone.
In 2006 I worked as a contractor for the City of Chicago and I came into contact with the Financial Management and Purchasing Systems (FMPS). This "enterprise system" provides the basis for a whole slew of innards that helps the City get things done. The main public interface into the FMPS was the Vendor, Contract, and Payment Search on the City of Chicago Web site. It struck me as an odd combination of deeply rich and immensely opaque. As the years went by, I stayed interested in this deep little database.
In October of 2008, I posted a message to the poliparse (politcal parsers) group to see if I could rustle up anyone smarter than me who could actually scrape this info and get it all out. After a while, a friend of mine got excited about liberating this data and making it more searchable.
After we launched CityPayments, we moved on to other things, but I kept tabs on what was going on. One thing the City did (in the previous administration) was to list the "10 Most Recent Awarded Contracts". This was a welcome change, and it is still pretty useful. Often the contracts are so new that they're not even available in PDF format yet, so you have to wait for the actual contract to get uploaded. You can see this here that the first contract number is not linked yet:
We have a number of improvements we've wanted to make, but haven't had time. Here's the list. If you're in the mood to code on CityPayments, let us know!
REVIEW OF CURRENT PUBLISHED DATA
The FMPS data provided in this first release by the City is described as follows:
All vendor payments made by the City of Chicago from 1996 to present. Payments from 1996 through 2002 have been rolled-up and appear as "2002." Payment information is available as summarized totals for 2003 through 2009. These data are extracted from the City’s Vendor, Contract, and Payment Search. Time Period: 1996 to present. Frequency: Data is updated daily. Related Applications: City of Chicago Vendor, Contract, and Payments Search (http://webapps.cityofchicago.org/VCSearchWeb/org/cityofchicago/vcsearch/controller/payments/begin.do?agencyId=city).
Note that "Payment information is available as summarized totals for 2003 through 2009" bit was added after a bug was discovered and reported by Dan Sinker and acknowledged right away. We are the bug fixers we've been waiting for. Also, Sinker popped the data dump into a Fusion Table.
Here's the fields published in the data:
Voucher number: This is simply the number of the invoice that is being covered by the payment. This field is often empty, and I never saw it as very important in the whole scheme of things. Also, it is difficult to do further research based on this field, because there's no way to search for it on the source site.
Recently, the City added a new method of initiating a payment:
New in 2010: Direct Voucher payments from January 2010 to the present. Direct Vouchers are used to pay for miscellaneous products and services that are not associated with a signed contract between the City and the Payee. Examples include debt service, utilities, third party payroll expenditures, court and legal settlements, and small payments such as travel reimbursements.
I don't really understand DV or why it was added. Former Alderman Eugene Schulter received more than $50,000 in these types of payments in the last year and a half.
Amount: the amount of money paid to the vendor for that particular voucher. Definitely useful when trying to match contracts to awards
Check date: Self-explanatory.
Department: There are 57 departments listed in the Contracts and Awards search on this Web search supported by the City.
Contract number: This is the key field if you are looking to do further research. With this number, you can search on the City Web site for pretty much everything you want to know about the contract. My advice: READ THE CONTRACTS. Better than fiction. Funny to see all the signatures.
Vendor name: name of the vendor being paid. There are often errors in this field-- names are conflated or misspelled, for instance. I wouldn't rely on this field if you want to find *all* contracts of a type
If you've read all the way down to here, it's time you're rewarded with a cool picture. Here it is:
Peoples Gas Education Pavilion at the Nature Boardwalk at Lincoln Park Zoo. (Note: there is a vendor called "LINCOLN PARK ZOOLOGICAL SCTY" and another called "THE LINCOLN PK ZOOLOGICAL SOC.". Again, don't count on Vendor Name to have unique IDs.)
FUN WITH DATA
These are things you can do with this current data:
- Use it as a jumping-off point for discovering about city contracts and vendors. Now that there is a feed of new payments being pumped to this Web page, you can see the City's checkbook as money goes out. That can spark deeper looks into the actual text of the underlying contracts. Good stuff in there. Some of it is goofy. Note: you do not have to be a coder (I Am Not A Coder) to do this-- you just have to copy/paste contract numbers, download PDFs, and read them
- Make broad year-by-year calculations of spending by Department and whack them against the numbers in the published budgets for each year going back to 2000. Back of the napkin stuff, just for fun. Again, this is knowable stuff based on that which is already published
- Do some fun things with the Vendor Name field. Rank them in order of dollars paid by Department. Do automated Google searches for the company names, boards of directors, Web site URLs and provide an index of the companies for others to annotate and build a directory
- See how many of the General Contractors from this list are also city vendors
- Clean up the Vendor Name field in general, combining obvious duplicates
This Payments stuff is clearly just a first step when it comes to structure data published out of the city's financial information system. I'd like to see more connections among payment data, vendor data, finished work, and so on. Data is almost never interesting by itself. It's the connections that make it interesting.
Chicagoans really need to get a better view of what we're getting for our money. And I don't mean this in an investigative reporter-style way. I mean that we should be able to look at a voucher and be able to see what we got out of it. And it's not all on the City to provide the info.
I want to see if other contractors could do it more cheaply. If a contractor got beat out of a job, they should use this data to prove how they could have done it for less. It's one thing for nutty developers to grab all of this data and make broad connections. It's another thing altogether for nutty business owners to take teensy slices of this data and make teensy conclusions judgements about it.
In the same vein, I want contractors who were awarded the work to make claims about quality, or better staff, or a better return on investment than the cheaper option.
I want to see the weekly report that the contractor provided related to the deliverables. Let's move past feet-to-the-fire-ism and move toward free market public relations. ("Yes, we got paid today, and here's what we did to earn it.")
I want to see a picture of what we got for the money, whether it's a bridge or a bucket. Tie the payment system into the Home Depot Point of Sale application. Provide VISA card-style itemized purchases. Why not? It certainly exists somewhere, why not everywhere?
Everything is so disconnected for now-- there is the record of the work on the LaSalle Intermodal at Congress Parkway and Financial Place progress here, a detailed PDF of the work over here, the 2nd Ward Alderman talks about it over here & etc. We all need to know we're all talking about the same thing. The contract UID is the key, and we all need to find ways to embed them into our loves more easily.
But that's for tomorrow. All hail the city of Chicago, as well as the City of Chicago.
(Bonus link: original research on ICAM, the primogenitor of all Web-based crime mapping applications that started off as a PC-based MapInfo 2.0 application in 1995).