Reducing Build Times In Gatsby 4 With Parallel Query Running

Dan Giordano
September 17th, 2021

A common experience to all creators in a Gatsby site is the Gatsby Build process. When running gatsby build, your project goes through a collection of build steps. This is all done in order to optimize the resulting site content for the best deployment speed, and for the best performance for your site visitors. Through our analysis of the telemetry we receive from the active Gatsby projects out there, we identified a common bottleneck: query running.

This has led to the next innovation in the Gatsby framework: parallel query running. We’ve rearchitected the Gatsby data layer ( not a trivial thing to do! ) to allow page queries and static queries to occur in parallel, leading to a 40% reduction in build times for some sites! This innovation starts with allowing for parallel content queries, but positions Gatsby for a number of interesting use cases ( imagine what you can do with a portable data layer 🤔 ).

Just what is “query running” again?

Since our goal is to provide you the fastest way to run the fastest frontend, we continually evaluate the overall process of building, shipping, and iterating on Gatsby sites. Through that analysis, we identified query running as our next target in order to optimize this overall workflow for Gatsby users.

Query running is the portion of the gatsby build process that happens after your site’s content has been sourced from the various content sources configured for your Gatsby site. This step is understandably one of the more expensive portions of the build process because it’s where all of the data is being extracted into the corresponding page data required to efficiently generate the actual website pages that your visitors interact with.

How we got there

The crux of the matter, regarding query running, is that Gatsby had historically utilized redux as its internal, in-process data store. Now, our existing datastore is very fast because it’s an in-memory data store, but it carries a key limitation that was hindering our ability to substantially optimize the Gatsby build process: it’s only accessible via the current thread/process. This means that the Gatsby build process, and more specifically the query running portion of that process, could be shared across CPU cores.

The Gatsby framework team evaluated a collection of strategies for optimizing and decoupling the data layer in order to allow cross-cpu, and possibly cross-machine coordination of content queries and landed on the node.js implementation of lmdb: lmdb-store as the foundation for the architecture update. lmdb-store affords incredibly efficient data access, focused on fast read operations, which makes it suitable for the Gatsby user’s use case.

The Gatsby main process now coordinates content query workers ( as shown in the figure below ) with the now-shared data store. Therefore, you will now have n-1 query workers when building your Gatsby site, where n is the total number of CPU’s provisioned for your Gatsby Cloud ( or other CI/CD host ) site.

Getting Started

To get started with a Gatsby build taking advantage of parallel query running, just head on over to our docs and download the latest version of Gatsby, Gatsby 4! Or type the below command into your terminal!

Talk to our team of Gatsby Experts to supercharge your website performance.