New schema customization API in Gatsby

Mikhail Novikov
Mikhail Novikov
March 4th, 2019

Today we are releasing a preview of a new core Gatsby API – Schema Customization. It gives Gatsby users much better control over the inferred schema, solving many common issues that people have had with their data sources. In addition to adding the new API, we rewrote big chunks of schema generation code from scratch. This gives us a great long-term foundation that will let us make Gatsby GraphQL better in the future.

I would like to thank our community member Stefan Probst, who not only did lots of initial groundwork on the refactoring, but also helped immensely with the follow-up work there. We are really happy to have such a great community and super grateful to Stefan for all his hard work. I'd also like to thank Pavel Chertorogov, the author of the graphql-compose library that we used, who's been super responsive to our bug reports and feature requests.

As it's a huge feature and big parts of the code are affected, we are releasing it as an alpha preview. You can try it by adding gatsby@schema-customization as a dependency for your Gatsby site.

We would really appreciate your help in surfacing any bugs in this code, so we encourage you to try it and report any issues that you encounter in this pinned issue. If you want to contribute to fixing some of those bugs, open PRs against this branch.

Why was it needed?

The motivation to do this change is a two-fold one. Before this feature, Gatsby automatically generated a GraphQL schema for your site based on the data available from your source plugins. While this schema inference is great for getting started it has also been the cause of many problems.

Automatically generating schemas mean that changing your data can result in a changed schema. An updated schema may no longer work with the queries you've written, resulting in errors and confusion. Making schema generation smarter is just pouring more oil on an already burning fire. The core issue is not the inference, but lack of control. Therefore we wanted to give people control over the schema.

On the other hand, we wanted to reevaluate our approach to schemas in general. In the "wild", GraphQL is used very differently than in Gatsby. Schemas aren't as commonly generated from the data sources and often schemas are the source of truth. We want to experiment with enabling people to use that approach with Gatsby too. By allowing people to define types and resolvers, we open new opportunities in that direction. We want to see how the community reacts to these changes and if that will evolve into new approaches to defining schemas in Gatsby.

New API

There are two main additions to the API:

  1. A createTypes action that allows one to add, extend or fix the types by passing their type definition using GraphQL SDL.
  2. A createResolvers Gatsby Node API that can add or override resolvers on any types and fields in the schema. It can also add new fields with such resolvers.

Why the two APIs? createTypes primary purpose is to fix the definition for an automatically generated Node type. Often one is totally happy with the default resolvers that Gatsby provides and the only issue is that inference can change based on data changes.

On the other hand, createResolvers is to add extra functionality to types. createResolvers also allows adding new root fields to Query type.

createTypes

Let's consider an example with gatsby-source-filesystem, where we are loading data from an authors.json file. It has the following contents:

This would be inferred in Gatsby as a Node type with a Date type for birthday.

However this can break if we accidentally add an invalid date as a birthday for a new node.

Now there is a type conflict between date and string and this will be inferred as string, possibly breaking our queries.

Luckily, now we can use the createTypes action to force birthday to be a Date.

Gatsby will now know that you want a Date and not override it with a string.

You can specify types for some or all of the fields that you have on the given node type. Gatsby will add missing fields. This behavior can be controlled with @infer and @dontInfer directives.

createResolvers

This is a similar API to setFieldsOnGraphQLNodeType in that it allows you to add new fields and resolvers to types. However, this one is run last, so you'd have the entire schema available to be augmented. It is also possible to extend the Query type to add custom root resolvers, which enables a powerful resolver-based approach to querying your data sources. createResolvers is called after third-party schemas are merged (e.g. ones added by gatsby-source-graphql), so you can extend those schemas too.

It's also possible to create new root fields, for example one that will return all author names as strings.

Notice the context.nodeModel. We expose our internal node storage to the resolvers, so that one can fetch data from there. In addition to lower level access functions (getNodeById, getAllNodes), full node querying is available in runQuery.

You can also see using-type-definitions example in the Gatsby repository.

Other niceties

Refactoring the schema generation allowed us to fix some related long-standing bugs and issues.

Type Names

Previously, type names were generated with names like internal_2 or SomeType_2, which can be extremely confusing. We've normalized all the names, so that these additional suffixes are no longer necessary. If you have relied on generated names as above, this branch will break for you. However, we never considered these types to be part our public API, partially because of the above issue. By making this change we can now assert that the naming of the types should be stable.

Connection nodes field

Querying connections is pretty verbose in Gatsby.

When you have many connections, this becomes pretty tedious, especially destructuring it all in JS. We've added a very common shortcut nodes that allows you to not write { edges { node }}, but directly resolves an array of nodes instead.

Inference quirks

We've had some quirks in inference that were dependent on ordering. We've made all inference deterministic.

  1. Mix of date and non-date strings is always a string
  2. Conflicting field names always prefer Node references first and then the canonical name of the field.

How did we do it?

The biggest issue with building GraphQL schemas with graphql-js is that graphql-js expects all types to be final at the moment where either the schema is created or one inspects the fields of the type. This is solved in graphql-js by using thunks, non-argument functions that refer to types in some global context. With hand-written schemas usually there are type definitions in the same file as the newly defined type, but this isn't available in a generated schema situation.

To solve these issues, a pattern called Type Registry has been widely used. A type registry is an abstraction that holds types inside it and allows other types to retrieve them.

After all types are collected into the type registry, the registry can be converted to a normal GraphQL schema. Other common features include being able to generate types like input objects and filter from the types held in the type registry.

We didn't want to implement a type registry and all the related parts ourselves. Thankfully, there is a library just for that – graphql-compose. We opted to use it and it saved us lots of time. I really recommend this library to anyone, especially if you plan to generate types.

The final schema pipeline that we implemented works like this:

  1. We collect all types that are created with createTypes and add them to the compose type registry (called Schema Composer)
  2. We go through all the collected nodes and we infer types for them
  3. We merge user defined types with inferred types and add them to the composer
  4. We add default resolvers for type fields, such as for File and Date fields
  5. setFieldsOnNodeType is called and those fields are added to the types
  6. We create derived input objects, such as filter and sort and then create pagination types such as Connections
  7. Root level resolvers are created for all node types
  8. Third-party schemas are merged into the Gatsby schema
  9. The createResolvers API is called and resulting resolvers are added to the schema
  10. We generate the schema

You can see the packages/gatsby/schema/ folder in the schema refactoring PR to learn more about the code.

Further work

These schema changes are a first step. In the future we want to add more control over the schema and more access to our internal APIs to our users. Our next step would be to add explicit types to the plugins that we maintain. We also want to let those plugins expose their internal APIs through the Model layer, like we did for our root Node API. This way one can reuse the functionality that is only available in plugins in their own resolvers.

We are super excited about those changes. As I mentioned, we really encourage you to try it by adding gatsby@schema-customization as a dependency to your Gatsby application. Send us feedback in this issue. We can't wait to hear your feedback on this new, core functionality and see all the great apps and functionality it allows you to build.

Talk to our team of Gatsby Experts to supercharge your website performance.