An apropos side quest about securing tokens (This Old Pony #103)

also

                May 6, 2022

            An apropos side quest about securing tokens (This Old Pony #103)

            This week we’re going to interrupt the plan for the Python PaaS-travaganza to address something a bit more timely, even urgent:
Securing server side auth tokens!
Because you should be worried about the security of your… users’ credentials.[0]
OAuth tokens: what they are and why we’re talking about them
To start from a clean slate, OAuth is a standard for delegating access to another system. It underpins a LOT of integrations that you probably use yourself, in which you’re logged into app A, then you’re redirected to app B where you also have an account, tell app B that you want to provide access to app A, and then app B gives app A a token (kind of like an API key) with some circumscribed set of access rights. Among other things, this avoids the need to provide user-facing API keys that you have to copy and paste between systems.
So if you have an application in which you either need to consume or publish data on behalf of your users, you’ll likely need to use OAuth in some form. It’s what you’d use for integrating with Salesforce, Google (e.g. Analytics), Facebook, GitHub, etc.
Now, if you’re managing regular user auth yourself, you’re never storing passwords directly in your database. People used to do this![1] But it’s wildly insecure because if your database is ever compromised then POOF all those passwords are too. Instead these are stored using one-way hashes, such that the input can be compared to what’s in the DB but it’s painfully difficult to reverse the hash and identify the password.
Keys don’t work that way. They can’t, because you actually need the full value to pass along. But this isn’t a big deal because unlike passwords the tokens are randomized strings; they’re not reused anywhere and there’s a built-in process for revoking them. If someone yoinks your database, you can just make some API calls to revoke every token and nobody’s the wiser.
Except for that time between when the database is compromised and you manage to make the calls to revoke.
Turns out this is actually something you need to worry about![2]
Industry worst practices
Many-ish years ago (back when you could buy a bushel of apples for a penny!) I inquired of some other developers how they safely stored OAuth keys. I was at the time trying to figure out what strategies I could copy from them. Every single one said they just stored the tokens as-is in the database. 
Based on an informal survey of comments on StackExchange questions and GitHub issues, I’m led to believe that this is the norm. 
It probably feels relatively safe. There are two types of tokens in an OAuth flow: the refresh token and the session token. The session token is the one you actually need for each “regular” API call and these are typically very short lived. In order to get or refresh a session token, you use the refresh token to request a new one. So if you’re overly focused on the short lifespan of the session token you may be led to believe that a security breach is of no consequence. But the refresh token is longer lived, much longer lived, often 30 days or much, much longer. 
Perhaps it’s the randomness of the tokens, or the fact that they’re not reused, I’m not sure. But something about these has lulled us[3] into complacency.
Solving with cryptography and a 12-factor app
The solution is to encrypt the tokens.
This is, of course, easier said than done, but it’s not actually that hard. Our goal is merely to separate the token data in the database from the usable information by keeping the key outside of the database. There are three rings of data source if we oversimplify:

The application code
The environment
The database

There can be multiple code bases, multiple databases or data stores, and instead of “environment” we could think of other secret stores, but the 3-rings better tell the story.

Each has a purpose independent of security, but each also represents a way of separating critical information such that if one is compromised it doesn’t make excessive difference for security’s sake.
This is where a 12-factor application deployment[4] comes into play. You can do this with other deployment patterns, too though as the most important thing is separating the key from the database. 
The keys need to be symmetrically encrypted, i.e. the same key both encrypts and decrypts the data. Again, the goal is ensure that if someone were to get hold of the database, the refresh tokens would be unusable without the key. To do so, we can turn to the Python cryptography library[5] and it’s Fernet implementation. This requires a base64 encoded 32-byte key, which turns out to also be a convenient size for adding to an environment. 
Implementing in a Django app
Without offering the code to a full blown solution, there are a few keys, if you will, to doing this in a Django app that stores OAuth tokens.

Pull the key from project settings, and from there from the environment
Don’t get or set the refresh token directly on a model field
Instead use getter and setter methods that make use of the key to encrypt and decrypt the DB value

from cryptography.fernet import Fernet
from django.conf import settings

class AppIntegration(models.Model):
    safe_token = models.CharField(max_length=200)

    ...

    def get_refresh_token(self):
       return Fernet(settings.ENCRYPT_KEY).decrypt(self.safe_token)

    def set_refresh_token(self, token):
        self.safe_token = Fernet(settings.ENCRYPT_KEY).encrypt(token)

We’re skipping exception handling here, there’s probably a slightly nicer way of implementing, and I elided everything about generating and/or validating the keys, but at this point that’s all details you can fill in. The important thing is that you’ve defused the risk your stored tokens in the database pose.

The next issue should still kick off the Python PaaS-travaganza, comparing a simple Python deployment on various Python-friendly platform-as-a-service offerings before we trial running a N-tier Django application across each of the same!
Symmetrically yours,
Ben

[0] https://www.youtube.com/watch?v=BDvhZPvkWOE
[1] Some still do and woe is you if you’re one of their users or customers. Probably just banks, health care, and government web sites though.
[2] https://status.heroku.com/incidents/2413
[3] Not me, of course, I wouldn’t be smarmily writing this if that were so.
[4] https://wellfire.co/learn/easier-12-factor-django/
[5] https://cryptography.io/en/latest/
[x] Have feedback about the suggestions or complaints here? Just hit Reply and let ‘em at me.

Don't miss what's next. Subscribe to This Old Pony: Running Django apps in the Wild: