GraphQL Persisted Queries with HTTP Caching [Part 1]
Generated with Carbon.now.sh
GraphQL is a fast growing API specification, with aims of replacing REST APIs. A GraphQL server describes the data capabilities through the use of a type system and resolvers. A client is able to send a descriptive GraphQL query of what they want. The structure of the response then matches the query, providing a predictable result. There are many benefits to GraphQL servers and clients, to which I am not going to cover here as there is plenty of material on the Internet talking about those.
This four-part blog post series is specifically covering the topic of GraphQL Persisted Queries. A persisted query is a slight modification to the GraphQL specification that allows for better performance and security, at the cost of less flexibility. I will cover a bit of history regarding persisted GraphQL queries, along with the problems it solves. We will look at how to implement persisted queries in Rails and Express. As an extension to persisted queries, we will look at how to adapt them to take advantage of HTTP caching.
GraphQL presents a flexible endpoint to which clients can send queries, however, this flexibility comes at a cost. The following three concerns are specifically targeting performance and security:
As a consumer of a GraphQL API, it possible to construct any query for the server to process. You can hope that the consumers are doing their best to create good queries, but in a public API that might not be the case. You might have ill-informed users creating very expensive queries, or even a bad actor trying to timeout or cripple your server by sending deeply cyclical queries.
There are several ways to mitigate these issues, as further outlined by Max Stoiber’s article on Securing Your GraphQL API from Malicious Queries. In particular: - Depth Limiting: Rejecting queries which are too deeply nested - Amount Limiting: Rejecting queries which ask for too much information (i.e., via pagination arguments) - Query Cost Analysis: Rejecting queries which are too expensive (by assigning complexity values to fields) - Query Whitelisting: Rejecting queries that are not whitelisted
I would like to also add Time Limiting, which would reject queries that take too long to resolve. Query Whitelisting is only applicable for private APIs, but otherwise, these are all good approaches for preventing malicious or expensive queries from hitting your API. As per the topic we are covering, we’ll focus on Query Whitelisting (otherwise known as Persisted Queries).
Facebook has been using persisted queries since 2013, and comes highly recommended for production usage from them. The essence of a persisted query is that the query is persisted on the server’s side and that a client can reference it using some unique identifier. A great primer on persisted queries can be found on Apollo’s blog article for this topic.
For the sake of completeness, I want to demonstrate a scenario where persisted queries shine.
A client sends the following query to the server:
query {
company {
name
consoles {
name
releaseYear
}
}
}
No problems so far! Now a bad actor sends the following query:
query {
company {
consoles {
company {
consoles {
company {
consoles {
name
# ... continues nesting till happy with the damage
}
}
}
}
}
}
}
The server evaluating this query can experience performance or stability issues due to the deep nesting and complex nature of the query. Going forward, we will make some assumptions about our API: - We control both the server and the clients (i.e., web/mobile clients) - We don’t expose a public API (it is accessible, but it isn’t promoted for external usage) - The data being returned from the queries is not personalized
In our specific case, we can use persisted queries to remedy the issue of malicious users sending bad queries to our API. In addition, we will also gain some performance benefits (i.e., reducing the request’s network size).
That sounds great, but how can we go about implementing this? As previously mentioned, persisted queries are not part of the official specification. There are many implementations that exist, as well as some tooling for supporting persisted queries. In my experience at the time of writing this, there wasn’t a standard way to implement persisted queries.
I want to stress the following: Persisted Queries only work if you control the server and the client. In theory, you could use persisted queries on public APIs, although the security gains are not present. I do want to mention that Automatic Persisted Queries is one way which uses the concept of persisted queries solely for performance gains.
For the sake of brevity and focus, this series will focus on the following platforms:
Part two will cover the following sections:
Part three will cover the following sections:
Part four will cover the following sections:
This topic was presented at GraphQL Toronto July 2018: