1. Securing Our APIIn Chapter 9 we added a new endpoint to our API - We have an issue though - anybody can hit the API and broadcast whatever they want to our entire mailing list. It is time to level up our API security toolbox.
Chapter 10 - Part 0
2. AuthenticationWe need a way to verify who is calling We need to find a way to verify the identity of API callers - we must authenticate them. By asking for something they are uniquely positioned to provide.
Each approach has its weaknesses. 2.1. Drawbacks2.1.1. Something They KnowPasswords must be long - short ones are vulnerable to brute-force attacks. On average, a person has 100 or more online accounts - they cannot be
asked to remember hundreds of long unique passwords by heart. 2.1.2. Something They HaveSmartphones and U2F keys can be lost, locking the user out of their accounts. 2.1.3. Something They AreBiometrics, unlike passwords, cannot be changed - you cannot "rotate" your fingerprint or change the pattern of your retina's blood vessel. 2.2. Multi-factor AuthenticationWhat should we do then, given that each approach has its own flaws? That is pretty much what multi-factor authentication (MFA) boils down to - it requires the user to provide at least two different types of authentication factors in order to get access. 3. Password-based AuthenticationLet's jump from theory to practice: how do we implement authentication? Passwords look like the simplest approach among the three we mentioned. 3.1. Basic AuthenticationWe can use the 'Basic' Authentication Scheme, a standard defined by the Internet Engineering Task Force (IETF) in RFC 2617 and later updated by RFC 7617. The API must look for the
where According to the specification, we need to partition our API into protection spaces or realms - resources within the same realm are protected using the same authentication scheme and set of credentials. The API must reject all requests missing the header or using invalid credentials - the response must use the
Let's implement it! Extracting username and password from the incoming request will be our first milestone. Let's start with an unhappy case - an incoming request without an
It fails at the first assertion:
We must update our handler to fulfill the new requirements.
To extract the credentials we will need to deal with
the base64 encoding.
We can now write down the body of
Take a moment to go through the code, line by line, and fully understand what is happening. Many operations that could go wrong! We are not done yet - our test is still failing.
Our status code assertion is now happy, the header one not yet:
So far it has been enough to specify which status code to return for each error - now we need something more, a header.
Our authentication test passes!
The test suite is green again. 3.2. Password Verification - Naive ApproachAn authentication layer that accepts random credentials is... not ideal. We will
create a new
A first draft for the schema might look like this:
We can then update our handler to query it every time we perform authentication:
It would be a good idea to record who is calling
We now need to update our happy-path tests to specify a username-password pair that is accepted by
which we will then be calling from our
All our tests are passing now. 3.3. Password StorageStoring raw user passwords in your database is not a good idea. An attacker with access to your stored data can immediately start impersonating your users - both usernames and passwords are ready to go. 3.3.1. No Need To Store Raw PasswordsWhy are we even storing passwords in the first
place? If equality is all we care about, we can start devising a more sophisticated strategy. All deterministic functions return the same output given the same input. We need to go in the opposite direction: if If we had such a function Does this actually improve our security posture? It is not that
difficult to define an injective function - the reverse function, We want a cryptographic hash function. There is a caveat: hash functions are not injective2, there is a tiny risk of
collisions - if 3.3.2. Using A Cryptographic HashEnough with the theory - let's update our implementation to hash passwords before storing them. There are several cryptographic hash functions out there - MD5,
SHA-1, SHA-2, SHA-3, KangarooTwelve, etc. On top of the algorithm, we also need to choose the output size - e.g. SHA3-224 uses the SHA-3 algorithm to produce a fixed-sized output of 224 bits. The Rust Crypto organization provides an implementation of SHA-3, the
For clarity, let's rename our
Our project should stop compiling:
Our
Let's update it to work with hashed passwords:
Unfortunately, it will not compile straight away:
Let's spare ourselves another migration by using the second option:
The application code should compile now. The test suite, instead, requires a bit more work.
We need
We can then attach an instance of
To finish, let's delete
The test suite should now compile and run successfully. 3.3.3. Preimage AttackIs SHA3-256 enough to protect our users' passwords if an attacker gets their hands on our Let's imagine that the attack wants to crack a specific password hash in our database. How hard is it? The math is a bit tricky, but a brute-force attack has an exponential time complexity - 3.3.4. Naive Dictionary AttackWe are not hashing arbitrary inputs though - we can reduce the search space by making some assumptions on the
original password: how long was it? What symbols were used? We can count the number of password candidates:
It sums up to roughly Assuming a hash rate of 3.3.5. Dictionary AttackLet's go back to what we discussed at the very beginning of this chapter - it is impossible for a person to remember a unique password for hundreds of online services. Furthermore,
most passwords are far from being random, even when reused - common words, full names, dates, names of popular sport teams, etc. In a couple of minutes they can pre-compute the SHA3-256 hash of the most commonly used 10 million passwords. Then they start scanning our database looking for a match. This is known as dictionary attack - and it's extremely effective. All the cryptographic hash functions we mentioned so far are designed to be
fast. We need something much slower, but with the same set of mathematical properties of cryptographic hash functions. 3.3.6. Argon2The Open Web Application Security Project (OWASP)5 provides useful guidance on safe password storage - with a whole section on how to choose the correct hashing algorithm:
All these options - Argon2, bcrypt, scrypt, PBKDF2 - are designed to be computationally demanding. Let's replace SHA-3
with Argon2id, as recommended by OWASP. Let's add it to our dependencies:
To hash a password we need to create an
What about
We know enough, at this point, to build one:
It is a re-export from the
3.3.7. SaltingArgon2 is a lot slower than SHA-3, but this is not enough to make a dictionary attack unfeasible. It takes longer to hash the most common 10 million passwords, but not prohibitively long. What if, though, the attacker had to rehash the whole dictionary for every user in our
database? That is what salting accomplishes. For each user, we generate a unique random string - the salt. The salt is stored next to the password hash, in our database. Let's add a
We can no longer compute the hash before querying the
Unfortunately, this does not compile:
Given that a change is necessary, we can shoot for something better than base64-encoding. 3.3.8. PHC String FormatTo
authenticate a user, we need reproducibility: we must run the very same hashing routine every single time. If we store a base64-encoded representation of the hash, we are making a strong implicit assumption: all values stored in the As we discussed a few sections ago, hardware capabilities evolve over time: application developers are expected to keep up by increasing the computational cost of hashing using higher load parameters. To keep authenticating old users we must store, next to each hash, the exact set of load parameters used to compute
it. We could go for the naive approach - add three new columns to our What happens if
a vulnerability is found in Argon2id and we are forced to migrate away from it? It can be done, but it is tedious. Using the PHC string format, an Argon2id password hash looks like this:
The
Storing password hashes in PHC string format spares us from having to initialise the
By passing the expected hash via Let's update our implementation:
It compiles successfully.
What about our tests?
We can look at logs to figure out what is wrong:
Let's look at the password generation code for our test user:
We are still using SHA-3!
The test suite should pass now. 3.4. Do Not Block The Async ExecutorHow long is it taking to verify user credentials when running our
integration tests?
We can now look at the logs from one of our integration tests:
Roughly 10ms.
How does it work?
How is
Every time We have a different state in The executor can then choose to poll the same future again or to prioritise making progress on another task. This is how async runtimes, like The underlying assumption is that most async tasks are performing some kind of input-output (IO) work - most of their execution time will be spent waiting on something else to happen (e.g. the operating system notifying us that there is data ready to be read on a socket), therefore we can effectively perform many more tasks concurrently than we what we would achieve by dedicating a parallel unit of execution (e.g. one thread per OS core) to each task. This model works great assuming tasks cooperate by frequently yielding control back to the executor. You should always be on the lookout for CPU-intensive
workloads that are likely to take longer than 1ms - password hashing is a perfect example. Let's get to work!
The borrow checker is not happy:
We are
launching a computation on a separate thread - the thread itself might outlive the async task we are spawning it from. To avoid the issue, You might argue - "We are using
It holds a reference to the string it was parsed from. Let's create a separate function,
It compiles! 3.4.1. Tracing Context Is Thread-LocalLet's look again at the logs for the
We are missing all the properties that are inherited from the
root span of the corresponding request - e.g. Let's look at
The current span is the one returned by
"Current span" actually means "active span for the current thread". We can work around the issue by explicitly attaching the current span to the newly spawn thread:
You can verify that it works - we are now getting all the properties we care about.
We can now easily reach for it every time we need to offload some CPU-intensive computation to a dedicated threadpool. 3.5. User EnumerationLet's add a new test case:
The test should pass straight-away. Let's look at the logs!
Roughly 1ms. Let's add another test: this time we pass a valid username with an incorrect password.
This one should pass as well. How long does the request take to fail?
Roughly 10ms - it is one order of magnitude smaller! If an attacker knows at least one valid username, they can inspect the server response times11 to confirm if another username
exists or not - we are looking at a potential user enumeration vulnerability. It depends. If you are running a SaaS product, the situation might be more nuanced. Even in our fictional example, user enumeration is not enough, on its own, to escalate our privileges. How do we prevent
it?
The second is generally valuable as a protection against brute-force attacks, but it requires holding some state - we will leave it for later. Let's focus on the first one. Right now, we follow this recipe:
We need to remove that early exit - we should have a fallback expected password (with salt and load parameters) that can be compared to the hash of the password candidate.
There should not be any statistically significant timing difference now. 4. Is it safe?We went to great lengths to follow all most common best practices while building our password-based authentication flow. 4.1. Transport Layer Security (TLS)We are using the 'Basic' Authentication Scheme to pass credentials between the client and the server - username and password are encoded, but not
encrypted. 4.2. Password ResetWhat happens if an attacker manages to steal a set of valid user credentials? Right now, there is no way for a user to reset their passwords. This is definitely a gap we'd need to fill. 4.3. Interaction TypesSo far we have been fairly vague about who is calling to our API. The type of interaction we need to support is a key decision factor when it comes to authentication. We will look at three categories of callers:
4.4. Machine To MachineThe consumer of your API might be a machine (e.g. another API). To significantly raise our security profile we'd have to throw in something they have (e.g. request signing) or something they are (e.g. IP range restrictions). Both signing and mTLS rely on of public key cryptography - keys must be provisioned, rotated, managed. The overhead is only justified once your system reaches a certain size. 4.4.1. Client Credentials via OAuth2Another option is using the OAuth2 client credentials flow. We will speak more about OAuth2 later, but let's spend a few words on its tactical pros and cons. APIs no longer have to manage passwords (client secrets, in OAuth2 terminology) - the concern is delegated to a centralised authorization server. There are multiple turn-key implementations of an authorization server out there - both OSS and commercial. You can lean on them instead of rolling your own. The caller authenticates with the authorization server - if successful, the auth server grants them a set of temporary credentials (a JWT access token) which can be used to call our API. JWT validation is not without its risks - the specification is riddled with dangerous edge cases. We will speak more about it later. 4.5. Person Via BrowserWhat if we are dealing with a person, using a web browser? 'Basic' Authentication requires the client to present their credentials on every single request. We need a way to remember that a user authenticated a few moments ago - i.e. to attach some kind of state to a sequence of requests coming from the same browser. This is accomplished using sessions. A user is asked to authenticate once, via a login form13: if successful, the server generates a one-time secret - an authenticated session token. The token is stored in the browser as a secure cookie. This approach is often referred to as session-based authentication. 4.5.1. Federated IdentityWith
session-based authentication we still have an authentication step to take care of - the login form. Many websites choose to offer their users an additional option: login via a Social profile - e.g. "Log in with Google". This removes friction from the sign up flow (no need to create yet another password!), increasing conversion - a desirable outcome. Social logins rely on identity federation - we delegate the authentication step to a third-party identity provider, which in turn shares with us the pieces of information we asked for (e.g. email address, full name and date of birth). A common implementation of identity federation relies on OpenID Connect, an identity layer on top of the OAuth2 standard. 4.6. Machine to machine, on behalf of a personThere is one more scenario: a person authorising a machine (e.g. a third-party service) to perform actions against our API on their behalf. It is important to stress how this differs from the first scenario we reviewed, pure machine-to-machine authentication. 'Basic' authentication would be a very poor fit here: we do not want to share our password with a third-party app. The more parties get to see our password, the more likely it is to be compromised. Furthermore, keeping an audit trail with shared credentials is a nightmare. When something goes wrong, it is impossible to determine who did what: was it actually me? Was it one of the twenty apps I shared credentials with? Who takes responsibility? This is the textbook scenario for OAuth 2 - the third-party never gets to see our username and password. They receive an opaque access token from the authentication server which our API knows how to inspect to grant (or deny) access. 5. What Should We Do NextBrowsers are our main target - it's decided. Our authentication strategy needs to evolve accordingly! We will first convert our 'Basic' Authentication flow into a login form with session-based auth. That is going to be the roadmap for the next episode! See you soon! Zero To Production In Rust is a hands-on introduction to backend development in Rust. Book - Table Of ContentsClick to expand!The Table of Contents is provisional and might change over time. The draft below is the most accurate picture at this point in time.
What type of attacks use every possible letter number and character found on a keyboard when cracking a password?Brute Force
Brute force password attacks utilize a programmatic method to try all possible combinations for a password. This method is efficient for passwords that are short in string (character) length and complexity.
Which hashing algorithm is provided by WinHex?WinHex can calculate several kinds of hash values of any file, disk, partition, or any part of a disk, even 256-bit digests, for the most suspicious ones. In particular, the MD5 message digest algorithm (128-bit) is incorporated, which produces commonly used unique numeric identifiers (hash values).
Which AccessData feature compares known file hash values to file on your evidence drive or image files to see whether they contain suspicious data?Chapter 6-13 multiple choice. Which action alters hash values making cracking passwords more difficult?Salting means adding randomly generated characters to the input values before hashing them. It's a technique that's used in password hashing. It makes the hashing values unique and more difficult to crack.
|