(Re-) Introducing Gatsby, A Reactive Site Generator

Kyle Mathews
Kyle Mathews
August 12th, 2022
jay-z allow me to re-introduce myself

Gatsby Cloud can now publish to our CDN in one second. This is now 100x faster than the exact same Gatsby site building on a standard CI/CD service!

We launched Gatsby v1 in 2017, to help teams build fast, secure, and powerful websites. Since then we’ve been doubling down on its novel, powerful features — a data layer and reactive builds.

The data layer (exposed via consistent, standardized GraphQL) syncs data from sources like Contentful, WordPress, Shopify into the Gatsby DB, giving you access to a real-time stream of data changes. 

Reactive builds are powered by GraphQL. Each page declares its data dependencies through GraphQL; then, when data changes, Gatsby Cloud ensures invalidated pages are updated.

For the past 12 months, the Gatsby team has been focused on porting this architecture to our specialized Cloud infrastructure and optimizing publish performance.

Today, there’s no faster, simpler, more scalable way to publish content to the web. 

We call it Reactive Site Generation.

The median publish time for content updates for all Gatsby v4 sites on Gatsby Cloud is now just five seconds.

Here’s an example of what it feels like to use Gatsby Cloud with a Content Management System (CMS).

It’s hot reloading, but for your production website.

Site owners need one second publishes. Whether they’re previewing changes, publishing a typo fix for a news article from a CMS, or there’s real-time pricing or inventory updates on their ecommerce site, it’s critical the change goes live immediately. This experience is what we’ve shipped.

Reactive Site Generation is a novel approach to solving an old problem — how to efficiently host and update websites, both large and small. This blog post and the following will dive deep into the existing approaches: Static Site Generators (SSG) and Server-Side Rendering (SSR), their downsides, and why we believe a new approach is necessary.

How “reactivity” works

Reactive programming describes software that’s designed to automatically update outputs. So if output A depends on inputs B + C, and B or C changes, A is automatically updated. It is often contrasted with imperative programming where you must manually track updates to inputs and schedule rebuilding outputs.

React.js is a great example of reactive programming. Your inputs, JSX components and data, tell React what you want the output, the DOM, to look like. Then, React makes it so.

React (and other similar modern JS component libraries) simplify building high performance web sites and applications vs. earlier imperative technologies like jQuery. The reason nobody built this before React is that implementing a reactive engine to track changes and flush them to the DOM is hard. It’s easy to imagine automated behaviors but often very hard to implement.

Reactivity on the Web

The ideas of reactivity programming can be used for automatically updating any output — in Gatsby Cloud’s case, a cache on a CDN.

Modern websites rely heavily on CDN technology to ensure that page load speeds are consistently fast and robust against huge traffic spikes. They’re a fast and scalable global cache for site assets.

But just like with React writing updates to the DOM, the hard part of using a CDN isn’t writing to it the first time, it’s updating it rapidly as your data changes — whether from a CMS or ecommerce backend.

Gatsby Cloud’s implementation of Reactive Site Generation automatically updates your site’s CDN cache — often in under a second. It’s hot reloading, but for your production website. You change anything and your site updates, automatically.

Benchmarking Reactivity

I picked for the benchmark a use case that most assume impossible for Gatsby to handle — let’s see how RSG handles it.

The benchmark measures how long it takes to update the inventory level on a product page after it changes. The site is an ecommerce site with 5,000 products.

One of the most important features of any ecommerce site is the ability to display up-to-date inventory so customers can see if something is sold out and avoid adding those out-of-stock products to their cart. This is why it’s critical that updates to the CDN cache happen as quickly as possible.

The inventory levels for the 5,000 products are stored in a Chiselstrike database. I do 100 runs of the benchmark. Each run of the benchmark updates the inventory level and then continually downloads the associated web page from the CDN until the new inventory level shows up and then stores the duration from start to finish.

I implemented the benchmark in Gatsby and two other common React meta-frameworks, Next.js and Remix, to test the various techniques for updating a CDN cache. While I only used React meta-frameworks, as these are what Gatsby is often compared to, the same caching techniques and benchmark results are meaningful across other website technologies.

Gatsby Cloud Gets Reactive

For the first benchmark comparison — I’ll compare Gatsby as an RSG on Gatsby Cloud vs. Gatsby as an SSG on Netlify.

On CI/CD services like Netlify, AWS Amplify, Vercel, (and many others), Gatsby runs as an SSG — which means that for both code and data updates, the entire build process must be set up and run in a provisioned Linux container.

On the other hand, Gatsby Cloud optimizes for data updates by building sites into a RSG service which can instantly react to data changes. We have been shipping many improvements to this service over the past year including a huge performance improvement to deploys on Gatsby Cloud earlier this year.

To benchmark RSG vs SSG, I ran the benchmark with the exact same Gatsby site on Gatsby Cloud and on Netlify.

In the following tables, p stands for percentile so p50 means 50% of runs finished within this time.

p50 p75 p99
Gatsby Cloud RSG 1.3s 1.7s 4.1s
Netlify Gatsby SSG 124s 132s 152s

Conclusion: RSG is around 100x faster than SSG at updating the page in the CDN cache.

SSR cache technique #1: maxAge

In 1999, HTTP 1.1 introduced the Cache-Control header which lets sites cache assets on the CDN until the “maxAge” expires.

This mimics the common application caching technique:

maxAge is generally quite simple to implement in a site. You set a Cache-Control header for each asset which tells the CDN how long to cache the item before refreshing or revalidating it. Next.js’ ISR is also an example of this technique.

There’s lots of nuance to picking the maxAge value which essentially boils down to a tradeoff between freshness of content and the amount of load you put on your backend. A lower maxAge means a lot more queries to the backend. A higher maxAge means less frequent queries. So let’s just say that the maxAge is set to 30 seconds so the inventory levels are stale after changes on average for 30s.

p50 p75 p99
Gatsby Cloud RSG 1.3s 1.7s 4.1s
Remix/Fastly: maxAge 30s 30s 30s

Conclusion: SSR:maxAge is ~4x faster than SSG at updating the page and ~20x slower than RSG.

SSR cache technique #2: stale-while-revalidate + manual cache revalidation

A much more performant way to update CDN caches was introduced by Fastly in 2014. It combines two ideas. One is called stale-while-revalidate — that instead of immediately invalidating a cache, the CDN in the background revalidates the assets against the origin. If the origin returns an update, it updates its cache. This is great as it protects against backend APIs that might at times be slow or return errors. The cache stays valid until the backend returns a proper update.

Stale-while-revalidate is fairly well supported on CDNs and browsers and has gotten quite popular.

Fastly went one step further where they paired stale-while-revalidate with their purge API so a backend could tell Fastly to immediately revalidate a path instead of waiting for the maxAge timeout. This technique for manual revalidation is still uncommon but support on CDNs is growing.

This CDN technique mimics the following common app caching technique:

I implemented this technique for two benchmark sites. One running Remix on Fly.io fronted by the Fastly CDN and another Next.js site running on Vercel.

For both I wrote glue code which:

  1. Listened to changes to inventory levels
  2. Determined which page(s) needed invalidated
  3. Called the respective CDN’s API to trigger revalidating the affected assets

This technique is similar to Reactive Site Generation. RSG is the process of building a data dependency graph and then using the graph against a real-time stream of data changes to decide when a page needs to be invalidated. While Gatsby has this baked into the framework, you can implement the same ideas in other systems.

To setup this pattern with Fastly for the Remix site, I had my products route return the following document headers:

“Cache-Control” is for the browser and “Surrogate-Control” instructs Fastly how to cache assets from the origin (Remix).

For the Next.js site, I used their new On-Demand ISR feature which implements this technique.

The results are as follows:

p50 p75 p99
Gatsby Cloud RSG 1.3s 1.7s 4.1s
Remix/Fastly: revalidate 0.4s 0.48s 0.6s
Next.js/Vercel: revalidate 0.97s 4.1s 8.4s

Conclusion: SSR:SWR/Revalidation is ~120x faster than SSG, 30x faster than SSR:maxAge and similar to RSG.

Analysis

The benchmark, again, is measuring how long an ecommerce website shows stale data after the inventory level changes.

A table with all results:

p50 p75 p99
Gatsby Cloud RSG 1.3s 1.7s 4.1s
Netlify Gatsby SSG 124s 132s 152s
Remix/Fastly — maxAge 30s 30s 30s
Remix/Fastly — revalidate 0.4s 0.48s 0.6s
Next.js/Vercel — revalidate 0.97s 4.1s 8.4s

And on a chart.

This matches up well with the general wisdom. SSGs are poor fits for caching data that changes rapidly. Using maxAge with SSR is faster than SSGs but still not that fast. And for the fastest updates, Gatsby Cloud’s RSG & SSR w/ manual cache revalidation lets you instantly update the CDN.

What does this all tell us about RSG? Gatsby Cloud’s RSG model is dramatically faster than SSG while beating or matching SSR caching techniques for real-time data updates.

We’ve spent years building towards this milestone and I couldn’t be more pleased with what we’ve accomplished.

Three-part series

In the next blog post, I dive deeper into the data layer and syncing data locally enables really fast rebuilds vs traditional SSGs.

In the final blog post, I take a look at the past, present, and future of scaling Gatsby and show how we’ll soon support sites of millions of pages and beyond.

Our goal is to make sure that Gatsby can easily scale to the largest of sites and give developers the power and flexibility to ship the best websites on the internet.

Despite having worked on Gatsby for 7 years now, in many ways I feel like we’re just getting started. Gatsby has evolved from its humble beginning as the first React meta-framework in 2015 to a robust, fast framework with deep integrations with the most popular CMSs, Serverless Functions, and SSG, SSR, and now RSG support. If you haven’t tried Gatsby in a while, I think you’ll be pleasantly surprised by how much faster things have gotten.

Kyle Mathews
Written by
Kyle Mathews

Founder @ GatsbyJS. Likes tech, reading/writing, founding things. Blogs at bricolage.io.

Follow Kyle Mathews on Twitter

Talk to our team of Gatsby Experts to supercharge your website performance.