Rearchitecting: Redis to SQLite

Michael Buckbee

23 Sep 2024

Pssst - want to skip right to the chart? click here

Background

We're Wafris, an open-source web application firewall company that, among other frameworks, ships a Rails middleware client.

At launch, the v1 client required a local Redis datastore to be deployed with your app. We're now releasing v2 of our Rails client which uses SQLite as the backing datastore.

This article covers the decision-making that went into our migration from Redis to SQLite, performance considerations, and architectural changes. Continue reading this if you're interested in the decision-making that went into our migration to SQLite from Redis for our clients (the deployed middleware)

TL;DR

SQLite has things it's good and bad at.
Redis has things it's good and bad at.
Traditional RDBMS (Postgres/MySQL) have things they're good and bad at.

These data stores are not drop-in replacements, and if you try to do that, you'll have a bad time. This article walks through the testing and decision-making that went into rearchitecting our v1 Redis-based client to our v2 SQLite-based client.

What forced this change?

Since day one of Wafris, our goal has been to make it as easy as possible for developers to protect their sites.

But our v1 has had mixed results in delivering on that promise. We made what we thought at the time was a solid choice to back the Wafris client with a user-owned (bring your own) Redis datastore.

Partially, this was due to our having come up in the Heroku ecosystem, where Redis is an easy click away to get going and deployed in such a way that it's easy to access remotely. We also looked at successful projects like Sidekiq, which had a similar model.

If we're trying to make things easy on you, we shouldn't casually make you a Redis database admin in the process.

But the ecosystem is much larger than that, and many of our users encountered Redis deployment issues that were difficult to debug and fix.

In other words, if we're trying to make things easy on you, we shouldn't casually make you a Redis database admin in the process.

Further, when we exhibited at RailsWorld 2023, there was a definite "blood in the water" vibe regarding Redis and the assumption that you'd automatically need a Redis server running alongside your Rails application.

What is Speed?

While Redis is "fast" in comparison to traditional RDBMS, it's still a database that you have to manage connections, memory, processes, etc., which introduces more brittleness into the stack (the opposite of what we're trying to achieve).

And more than any of those, if you're in a cloud environment, you're looking at network latency. Network delays are a big deal for us as every inbound HTTP request to your app must be evaluated against the rules stored in Wafris. So while we sweated blood and code to get our v1 client as fast as possible, we'd often wind up in situations where, despite our best efforts, we'd still slow down an application as the network the app was provisioned into was slow.

Monolith-ish Assumptions

While there certainly exist wholly distributed applications out there, and while most Rails apps are "majestic monoliths" (tm? idk, don't sue me), we've found a lot of distributed apps that messed up our assumptions.

Apps that deploy to multiple zones, split functionality into servers with overlapping responsibilities, or are only partially Rails and deployed alongside other languages or frameworks.

Mostly, things aren't that clean in production, introducing even more friction with using Redis.

Forcing a rethink of our architecture

Wafris is a web application firewall. In Rails, it's installed as a middleware. It lets you set rules like "block the IP 1.2.3.4," and then when someone requests your site, you evaluate that request against those rules.

Imagine it's this simplified two-step process:

Compare HTTP request against the rules (if match == 403, else 200)
Report treatment (blocked, allowed, passed)

Abstractly, this is a paired "read" of the rules in the database (step 2) and then a "write" of the report detailing what was done with that request along with its data.

From a logical standpoint, the first "read" half of this process is vastly more important than the latter "write" half:

Reads, aka "requests," need to be handled sequentially
Filtering of requests must work or bad requests could get through
Reads, aka "requests," are time-sensitive as they affect user-perceived site performance.
Writes, aka "reporting," can be done slower, batched, async, etc.

Enter SQLite

Others have written far more eloquently about what SQLite is suitable for in general than I could ever hope to do so myself, so on that topic, I'll point you to the following resources:

Aaron Francis' "High Performance SQLite" course at https://highperformancesqlite.com/
Stephen Margheim's "SQLite on Rails" - the how and why of optimal performance
Oldmoe - https://oldmoe.blog/

What's good about SQLite for Wafris?

As noted above, our major bottleneck is network IO, and Stephen mentioned this line from the SQLite documentation: "SQLite does not compete with client/server databases. SQLite competes with fopen()."

SQLite does not compete with client/server databases. SQLite competes with fopen().

In theory, this should be much faster than a Redis solution solely based on cutting out the network round trip.

So, we decided to benchmark SQLite vs Redis.

Benchmarking SQLite and Redis

Benchmarking is a dark art of deceiving yourself with highly precise numbers.

And benchmarketing datastores is even more fraught. Every single flipping database benchmark I've ever seen has been covered in a layer of asterisks and qualifications, and the comments on HN are full of "if you'd just set this flag when compiling it, you'd get 3% more speed out of reads, and the fact that the people running this didn't do this is proof that they were paid off and that they actively sell shady NFT scams of deranged yacht rock Harambe memes.

undefined

However, we have an advantage here. We're not trying to get some absolute number of speed under pristine conditions and carefully tweaked settings. We are creating a wildly specific and biased benchmark against our own data in a use case that lends itself to some extremes that I would neither expect nor hope that anyone else would have to deal with.

Studiously ignoring optimization tweaks

We're trying to optimize for people throwing Wafris into their app and having it "Just Work™" (ok, this one might actually be trademarked).

We're also not testing against some theoretical benchmark suite (that can be gamed ala NVIDIA).

We're testing the hot path of our own application and our worst query.

Our worst query is a somewhat complicated request for a "lexical decimal" data structure that maps IP ranges (v4 and v6) against categories. The simplest to consider is IP -> Country mapping, where we return the country if an IP address is within a range of two addresses.

It's terrible because these structures are, by necessity, large (millions of rows), and in the case of IPv6, every single entry is huge.

To make this happen, we precompute the range lookups and then wrote them to:

A table in SQLite
A sorted set in Redis

For each inbound HTTP request, we, in the pathological case, have to check the requesting IP against:

Custom Allow ranges
Custom Block ranges
GeoIP ranges
IP Reputation ranges

This is what we mean by "hot path"—this singular query type is so important that we can ignore all the rest of the query types and functions.

So, for our benchmarking scripts, we tested only this one query type, saving us days of work porting more significant (but irrelevant to the hot path) portions of the app to the benchmark.

The Testing Protocol

undefined

Testing was done on my local Macbook Air M2 with a homebrew installed Redis and local SQLite db.

We tested against our existing ranges dataset (1.2 million entries)
Multiple sets of IPs were then run in the same order against both SQLite and Redis.
At each multiple, we ran the test 5 times and took the average of the runs.

Testing Results

undefined

SQLite beat Redis like an overhyped UFC fighter doing a jump off the fence to land a kick punch (for our particular use case in this niche)

SQLite was roughly a 3x speed improvement over a locally deployed Redis instance. YMMV. Note again that this is before any network latency is considered.

From our perspective, this was a fantastic result as even if SQLite were only on par with Redis locally, we'd still win by being able to entirely axe network times.

I'd like to stress again that this is extremely flawed testing (by design) with really naive settings, but it is done in a way that mirrors the flaws of real-world usage we observed.

What's missing from the chart?

Benchmarks exist in a vacuum. What the chart fails to capture is that:

Even if the SQLite performance was significantly worse (like 2x worse) in the benchmark, it would still probably be faster in the "real world" because of network latency, even to a Redis that was in the same data center/region
Even if you had a super robust Redis server (cluster, sharded, etc.), there are still a bunch of limits around it for network bandwidth, connections, etc., and again cross-region latency. SQLite gets us near-infinite horizontal scaling for "free"; more on this below.
The onboarding is so much better with SQLite - I imagine most users won't even know that it's being used; they'll just add the gem to their web app and be up and running.
There are a lot of potential improvements that could be done to get more juice out of Redis. However, we've been unable to consistently persuade users to make even basic configuration changes (like cache eviction policy) to their Redis setups as it's such a pain and they don't want to be Redis admins.

We're focused on:

Making Wafris easy to deploy
Making rule evaluation as fast as possible

Everything else flows from that, so that's what we're rearchitecting towards. NOT "the best database setup," not "the easiest infrastructure for us (Wafris) to manage," etc.

Results are just the start

Great, so now that we've established that SQLite is faster than Redis (caveats, caveats, caveats), everything still isn't great because there are real-world tradeoffs.

One huge tradeoff in the testing above is that we didn't even consider doing any writes.

While it's never really stated, the whole reason for a "proper" database to have connections, connection pools, transactions, and whatever proprietary magic allows Oracle to charge a billion dollars for a database is to manage writes contending with reads for data in your database.

Consider an electric Chinese supercar driven by a charming British man - it's fast, it's fantastic, and it is absolute bollocks at hauling a load of concrete blocks across town.

SQLite is like that NIO EP9 supercar. We need to use it for what it's good at and not force it into a role it's not suited for.

Building a sync architecture

On v1 (Redis), the update loop looked like this:

User updates a rule ("block the IP 1.2.3.4") in Wafris Hub
Wafris Hub updates the rules in your Redis datastore

This obviously won't work with SQLite because we can't "push" a SQLite database to a web server. There are some newer SQLite as a service providers that let you do a version of this, but for a variety of cost, performance, and security considerations, it won't work for us as we'd still need individual users to deploy them, open ports, allow inbound connections, etc.

On v2 (SQLite), the update loop looks like this:

User updates a rule ("block the IP 1.2.3.4") in Wafris Hub
At some interval (time or number of requests), the client checks for updated rules
If rules are updated, the client downloads an entirely new SQLite database

This works well as it removes much of the user's installation and configuration responsibility.

We're seeing a ~3x increase in successful installs of the v2 client.

SQLite Distributed Architecture

Consider a Rails app deployed to a cloud provider (AWS, Heroku, Fly, etc). with autoscaling enabled.

Your requests go from 100 req/s to 10,000 req/s. and your compute (dynos, machines, ec2 instances) start spinning up to handle the load, but does your database?

The answer is almost certainly no because a bottleneck is a bottleneck without the database, and while you could overprovision it by 100x "just in case," realistically, nobody wants to do that.

In practice, this is the number one thing that we see kill Rails apps under heavy load. Rarely is it an actual DDOS attack; more often, it's a credential stuffing attack or a bad bot that is hammering a site, pushing up autoscaling that's then exhausting database connections, which causes the app to fall over.

Flipping this to a system where an SQLite DB is synced down to each compute instance solves this problem excellently by keeping all the calls local to the new compute instance.

But what about writes?

We started this article by discussing splitting the app into a read (rule evaluation) and write (reporting) path and then studiously ignored the write path.

We did rearchitect the write path to do the following:

Async connects to Wafris Hub for reporting
Batch sending of the reports
Completely removing database writes from the clients

Which will work for approximately 0% of everybody else, but we don't care about them.

We care about the 100% of our users who want the Wafris client to be easy to deploy and stupid fast.

It's hard to know if there's a lesson you can learn from reading all of this, but we sincerely appreciate you taking the time.

Conclusion

First: thanks to Aaron Francis, Travis Northcutt, Brian Hogg, Dave Ceddia, Peter Bhat Harkins and Nate Bosscher for their feedback and suggestions on this post.

Second: we're exceedingly happy with our v2 architecture, which uses SQLite. It's already helped many sites weather attacks and stay online.

It's much easier to get going, with less support work for us and less hassle for users, which we think is a win for a safer, more secure internet.

Ready to secure your site?

Create a free Web Application Firewall today

Start blocking traffic in 4 minutes or less