12. OpenFECs Results
Modern, free database (Postgres)
In the cloud (easy backup and replication)
Simplified and optimized data model
Accurate data
Speedy performance
Easy maintainability
Recently, I was at a roundtable discussion for federal executives and high-level private IT contractors. The discussion was about cloud innovation-- how to move to the cloud and replace legacy systems with newer technology.
The question arose: once you have a new system ready to replace the old one, how do you know when youre ready to pull the plug on the old one? Folks around the table even worried that they might *never* be quite ready to make the switch. The new system might never totally replicate the old one. They could even get stuck paying for not one, but two systems! So to answer the question of when you pull the plug.
Maybe you dont. There are other ways. There are ways to bridge the gap between old and new systems. Tonight Id like to talk about one particular way, which is.
which is to strangle them. This is where the story turns to APIs. APIs and Service Oriented Architectures mean that we dont have to have one or the other, new or old, on or off. There is a middle ground that we can tread, with more creative solutions to the problem.
But before we get fully into APIs and SOAs, Id like to take a quick trip to the jungle. This is the strangler fig. It grows over an existing tree in the rainforest, so that its leaves can reach past the canopy to the sunlight at the top, and its roots can stretch down to the ground, and find water at the bottom. The strangler vine follows the shape of the original tree, eventually encasing it almost completely.
In its final stage, the tree inside dies and can become hollowed out completely, leaving only the strangler fig standing.
Why are we talking about vines at a meetup about APIs? Well, in the early 2000s, Martin Fowler traveled to Australia and noticed the strangler vines in action in the rainforest. Being the smart engineer that he is, Fowler realized that these vines could be a metaphor for doing a system rewrite-- without actually doing a system rewrite, and he proposed the StranglerApplication Pattern.
A StranglerApplication is one that defines the interface around a legacy system, wrapping it, and allowing users and systems who talk to that system to not care about what is actually behind the curtain. This can allow the legacy system to be gradually replaced. And, of course, a perfect form for that StranglerApplication to take is that of an API.
So now Im going to tell you about an API that strangled a database, kind of accidentally on purpose. Its probably not the most classic example of a strangler application, but its close enough that its what got me thinking about all of this in the first place.
When I worked at 18F, we were tasked with building the FECs first true campaign finance API. The FEC wanted an API to serve up their campaign finance data in an easier to use and more timely format than the bulk data releases theyd been doing for the last 15-plus years.
First, let me set the stage, of what we had to work with. <click>
FEC had an Oracle Data Warehouse holding their campaign finance records. <click>
Our only access to it was via an elaborate, slow and cumbersome VPN setup. <click>
The tables in the database were complex, columns, hundreds of them, had incredibly long names that were hard to figure out, and it was going to be tough to query the millions of rows of data in the right way to produce clean and simple, and quick results for the API. <click>
So, we tackled each of these problems bit by bit. <click>
We chose the best* open source database, PostgreSQL, and hosted it in the cloud for easy backups and multi-zone redundancy. <click>
For shipping data from Oracle to Postgres, we researched solutions until we learned that there was an Oracle-owned product, called Golden Gate, designed specifically for shipping transaction- and schema-level changes from Oracle to Postgres. This meant that updates were pushed to us. A further benefit of this was that the replication processes could stay in the hands of the folks who knew the data best, the database administrators at the FEC.<click>
And finally, we simplified the data model by writing a series of materialized views in SQL over the raw data. We more or less created a materialized view per API endpoint. These views served to denormalize or flatten the data model, preprocessing some of the joins and aggregates, not to mention also allowing us to rename tables and columns in a simpler, more developer-friendly format.
So heres a view of the database and API architecture. When we built the API, our two highest priorities were data quality and speed. So, as Ive already described somewhat, the way that we architected this was to replicate, then wrap-- essentially build a scaffold-- over the existing data model, so that the underlying data was untouched, but we could build an accurate and performant system on top, that was also quite easy to maintain. So at left we have the Oracle to Postgres replication, and on the right you can see the basic API application stack-- materialized views in Postgres, with SQLAlchemy as the ORM, where we created a model for each materialized view, and made it super easy to query and filter the simplified data model, and on top of everything, we had Flask with the Flask-Restful plugin serving JSON at the API endpoints, up to the users.
So the result was that we built a performant and easy to use API, while leaving the responsibility for the data model and population in the hands of the expert DBAs at the FEC. And in the end, we accomplished something even bigger than all that.
Our API strangled a legacy database. And not just in the technical, architectural sense.
We strangled it culturally, too. FECs database administrators went, in the course of about three months, from not knowing what Postgres was, and worrying rather vocally about its ability to do the job, to loving Postgres, extolling its virtues, and starting to make plans about how they could _shut down_ their Oracle data warehouse and start funnelling transactions directly into our Postgres instance in the cloud, providing near-real-time data updates, which, by the way, are every campaign finance journalists dream. So, there you have it. Technical strangulation, and a huge cultural win, all because of one... little... >