Prerender proposal

Hi.

I thought I’d share my developer experience implementing an app that did many page preenders (30,000+ pages) during build and deploy that was, shall we say … sub-optimal.

Hope that my experience can inform some considerations when designing and implementing prerender in RedwoodJS – which will be a great and – I think – oft used feature.

Page Rendering circa 2017

:steam_locomotive: First, we travel back to Fall 2016, Winter 2017. I jumped aboard the JAMStack train.

The app I built used Middleman – a static-site generator – since I came from the Ruby world and I we had already been using Contentful to store “research data” (company & people profiles, blog posts, “market maps”). So we thought ahead, Add in Algolia to search, Auth0 to authenticate (and authorize via roles) and Netlify to build and deploy … and enforce auth.

We were an early user of Netlify’s role-based redirects where cookies stored the JWT and said if a user has access to the market map area, etc.

Contentful didn’t yet have GraphQL support, its Ruby sdk was still in early stages. etc, the its Delivery API could access the data I needed to generate pages. Also, Netlify did not have plugins to help with prebuild tasks ot the build plugin cache to help store file data between builds. I also wasn’t going to check in 30,000 pages into GitHub to keep in sync. Plus, what would check in? A local build? Netlify?

First Approach

:crossed_fingers: My first approach was basically (little simplified):

  • During Netlify build
  • Use a rake task in the build command
  • to fetch all data needed from Contentful
  • store in yaml files per type (companies, posts, people, etc). Maybe 3-4k entries.
  • fetch article data from a private microservice (25k+)
  • build app
  • Wait for Netlify to send up lots and lots of possible changes to CDN

At some point, build times got to be over 3 hours (sometimes hitting 6 hrs if lets say a layout chamges at all 40k+ pages changed), memory exploded, pushing to the CDN could fail on timeouts due to number of files … we got our own build instance and only built in the morning and evening.

Something had to change.

Second Approach later in2017

:thinking: Where can I optimize? Yaml generation.

With

  • some optimization (gzipping and archiving yaml to S3 and
  • only fetching data from Contentful and articles from a “last” date) and
  • optimizing the page rendering for Middleman

I got the builds down to 60-90 mins and no memory explosion.

This involved lots of pre-processing the pages w/ front matter instead of loading the massive datasets.

Again as a rake task, but now could be done w/ Netlify plugins.

Third Approach early 2018

:balance_scale: Can we scale to more articles?

We were adding articles at a rate of 1-2k per week, so w/in a few months 30k would become 60k. Would become 90k. FYI - there are ~ 300k now.

  • No longer prerender the article pages
  • Embedded a Vue.js app to fetch from the Article API itself (validated uer’s JWT and acces etc etc)
  • So now fewer pages and API calls

and build went down to < 30 mins.

Fourth approach early 2019

:boom: Scrap it all and make React app.

  • One time full data load
  • Contentful webhooks send changes (CRUD) to a microservice that sends to GetStream collections
  • Microservice sends Articles to GetStream
  • Other data (charts, structured JSON data) stored on S3 and Netlify lambda functions fetch
  • App builds only on feature changes and takes ~2-3mins :tada:

2017 Problems

Here are some of my problems with larger scale prerendering (again on not so optimized systems but alas the concepts hold true for design consideration).

Hope some of this can inform prerendering with RW.

:page_with_curl: Pagination

Whatever fetches the data to be rendered as pages may either have to render in batches (100-1000 at a time) or be able to fetch the entire dataset.

I haven’t tried a RW graphql example with pagination to see what that might look like.

:brain: Memory

If the entire dataset is returned, is this a memory concern?

When Middleman had to load all the yaml files, it was taking several GB of RAM and as I said, I had to get a dedicated Netlify build box. We were on Enterprise, so everyone there was happy to help. But, no.

⊧⊧⊧ Multple Models in page

How would multiple models in one page be handled – assuming the 2nd model is not related to the first?

For example,


// api/src/build/web/prerender.js

import { todos } from 'src/services/todos'

export const todo = todos

I want to shows todos and maybe also on a page show the map of the todo lat/lon? (guessing here). And lets say that makes a MapCell that calls a 3rd part api with lat/lon to fetch (city, state, zip, country).

Would a MapCell component still render if passed the lat, lon? Even if the Cell makes an API or GraphQL call?

In 2017 this meant I had to have all the data for all models in memory so the page could access wahtever it needed.

:alarm_clock: Timeouts / Connection Resets

During pagination of the Contentful API, I relatively frequently hit connection resets or timeouts due to network issues.

May need to implement a exponential backoff and retry in pagination calls.

:whale: Prerender data fetch fails = Incomplete Site?

Until I handled timeouts or retries more gracefully, build/deploy would not necessarily fail … even worse I’d only have a subset of data or old data and the site would reflect that.

:money_with_wings: Number/Cost/Limits of API Calls

If the prerender data isn’t cached (say as a Netlify build plugin or elsewhere) then every build will make api calls. If that is to the Prisma-backed database, maybe not a problem.

But, if a third-party/external api is being used, there rate limits, and calls per-day to consider. This can also have monetary considerations.

Other Thoughts

  • The term “prerender” has some name recognition with a “prerendering” service for SEO. Will this be confusing? See: https://docs.netlify.com/site-deploys/post-processing/prerendering/

  • There comes a limit when prerendering isn’t suited and the pages should be dynamic. Maybe it is 100 pages or 1000 or it depends on how the prerender data is fetched (and from where). The developer needs to be sensible if to use or not. But that’s the nature of things.

  • Would Auth work the same way? Would it be possible to also use Netlify auth/role based redirects to enforce? Not sure one would want to, but just thinking.

TL;DR

:slight_smile:

So, that’s my experience of “when prerendering goes bad”.

  • Pagination

  • Number/Cost of API calls

  • Build time

  • Memory

  • Connection reset/timeouts

  • Fails, incomplete sites

Confident it won’t go that way in RW.

7 Likes