GraphQL BatchLoader

GitLab uses the batch-loader Ruby gem to optimize and avoid N+1 SQL queries.

It is the properties of the GraphQL query tree that create opportunities for batching like this - disconnected nodes might need the same data, but cannot know about themselves.

When should you use it?

We should try to batch DB requests as much as possible during GraphQL query execution. There is no need to batch loading during mutations because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not identical) queries, then consider using the batch-loader.

When implementing a new endpoint we should aim to minimise the number of SQL queries. For stability and scalability we must also ensure that our queries do not suffer from N+1 performance issues.

Implementation

Batch loading is useful when a series of queries for inputs Qα, Qβ, ... Qω can be combined to a single query for Q[α, β, ... ω]. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex.

Batchloading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features.

There are two ways to use the batch-loader in your code. For simple ID lookups, use ::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find. For more complex cases, you can use the batch API directly.

For example, to load a User by username, we can add batching as follows:

class UserResolver < BaseResolver
  type UserType, null: true
  argument :username, ::GraphQL::STRING_TYPE, required: true

  def resolve(**args)
    BatchLoader::GraphQL.for(username).batch do |usernames, loader|
      User.by_username(usernames).each do |user|
        loader.call(user.username, user)
      end
    end
  end
end

project_id is the ID of the current project being queried
loader.call is used to map the result back to the input key (here a project ID)
BatchLoader::GraphQL returns a lazy object (suspended promise to fetch the data)

Here an example MR illustrating how to use our BatchLoading mechanism.

How does it work exactly?

Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling #sync), they will be loaded along with all other similar objects in the current batch.

Inside the block we execute a batch query for our items (User). After that, all we have to do is to call loader by passing an item which was used in BatchLoader::GraphQL.for method (usernames) and the loaded object itself (user):

BatchLoader::GraphQL.for(username).batch do |usernames, loader|
  User.by_username(usernames).each do |user|
    loader.call(user.username, user)
  end
end

What does lazy mean?

It is important to avoid syncing batches too early. In the example below we can see how calling sync too early can eliminate opportunities for batching:

x = find_lazy(1)
y = find_lazy(2)

# calling .sync will flush the current batch and will inhibit maximum laziness
x.sync

z = find_lazy(3)

y.sync
z.sync

# => will run 2 queries

x = find_lazy(1)
y = find_lazy(2)
z = find_lazy(3)

x.sync
y.sync
z.sync

# => will run 1 query

Testing

Any GraphQL field that supports BatchLoading should be tested using the batch_sync method available in GraphQLHelpers.

it 'returns data as a batch' do
  results = batch_sync(max_queries: 1) do
    [{ id: 1 }, { id: 2 }].map { |args| resolve(args) }
  end

  expect(results).to eq(expected_results)
end

def resolve(args = {}, context = { current_user: current_user })
  resolve(described_class, obj: obj, args: args, ctx: context)
end

We can also use QueryRecorder to make sure we are performing only one SQL query per call.

it 'executes only 1 SQL query' do
  query_count = ActiveRecord::QueryRecorder.new { subject }.count

  expect(query_count).to eq(1)
end