GraphQL API Gateway Patterns
Cross API Joins

Cross API JOINs Pattern for GraphQL APIs

Wouldn't it be useful if we could join data from multiple GraphQL APIs with a single GraphQL Query?

The Problem

Let's say we have a GraphQL API that used the Schema Stitching Pattern to combine two APIs into a single unified Graph. We're also using the API Namespacing Pattern to avoid naming conflicts. All good so far.

Now we would like to load data from one API, e.g. the capital of a country, and use the response to JOIN the weather from another API for that particular city.

We don't like to make multiple requests to the GraphQL API, because that would add a lot of latency. How can we solve this problem?

The Solution

In this case, I'll go backwards and start with the GraphQL Query that represents the final solution. Then I'll work backwards to explain the how and why.

# mark the capital field as internal
# this means that the client can ignore it
query ($countryCode: String!, $capital: String! @internal) {
  # load the country from the countries API
  # the countries API is namespaced as "countries"
  country: countries_countries(filter: { code: { eq: $countryCode } }) {
    code
    name
    # resolve the capital field and export it into the variable "capital"
    capital @export(as: "capital")
    # the _join field is a special field that returns the Query type
    weather: _join {
      # load the weather from the weather API
      # the weather API is namespaced as "weather"
      # we now use the capital variable which was previously exported
      weather_getCityByName(name: $capital) {
        weather {
          temperature {
            max
          }
          summary {
            title
            description
          }
        }
      }
    }
  }
}

How does this work and how can you implement it for your own GraphQL API?

The first requirement is that we can rely on our resolvers to be executed depth-first. That is, the field capital is above the _join field, so we must be able to rely on the fact that the capital field will be resolved before the _join field.

Implementing the _join field by itself is quite simple. For each Object or Interface type in the schema, we add a _join field that returns the Query type. The resolver for the _join field will return an empty object.

Next, we need the @internal directive. We want to make JOINs available in a 100% spec-compliant way. That means that we don't want to add anything to our GraphQL Operation that would make the GraphQL Operation invalid.

We need a place to store the data that we want to use for the JOIN, which might be a required argument. At the same time, we don't want to expose this variable to the client.

To achieve our goal, we've added the @internal directive to our GraphQL Schema. During client code generation, we can use this directive to ignore the variable. The client will not be allowed to set the variable, but input validation will still be valid, because we "strip" out the variable before validating the input.

The last piece of the puzzle is the @export directive. We need to be able to export the value of the resolved field into a variable. It's important that this works for lists as well, e.g. if we want to make JOINs on a list of countries.

After each field is resolved, we check if the field has the @export directive. If it does, we create a context object for this particular "scope", assuming that no context for the current list scope exists yet. We then set the exported value on the context variables, so it can be used by another field.

Considerations

While it's a very powerful pattern, it should be used with caution. Exposing such a functionality on a public API can easily lead to overloading the server with large numbers of nested requests.

My recommendation is to use this pattern only for internal APIs, or combine it with the Persisted Operations Pattern. The latter will ensure that no API consumer can construct arbitrary JOINs on the fly.

That said, adding such a feature to your GraphQL API can be very powerful. Being able to JOIN data from multiple APIs without having to ask a backend engineer to add a new field can save a lot of time.