DIY Gatsby: Module bundling

This is Part II of a series where we'll look into how kimmo.blog works. Part II focuses on how Rollup, custom glue code, and templates are used to convert source files into pages that work in all browsers.

Motivation has been covered in Writing, or coding.

Module bundler

We need a way to convert .tsx source files into JS bundles. TypeScript compiler does the job in simple cases, but we want a full-blown module bundler. Why? Because module bundler is able to handle also other tasks required to build a fast production site: tree shaking, bundle splitting, import aliases, process.env replacements, etc.

I looked into Webpack, Snowpack, Parcel, Rollup, and Babel. They all have slightly different abstraction levels and philosophies, but Rollup was my choice for the task. Its configuration format is simple and the core is very minimal. Rollup allows precise low-level control over the bundling process.

More control usually equals more work. Rollup is no exception. Most tasks that Webpack and other bundlers do by default, need a plugin in Rollup. The explicit configuration and minimal core force you to think about details though.

If a dependency uses Node.js core library such as crypto, do you want to polyfill it?

It sounds tedious, but on the other hand it makes you very aware of what's happening under the hood. The payoff is increased understanding of the build process and most probably a smaller bundle size.

This is how the simplified rollup.config.js for this blog looks like:

export default {
  // Use tmp dir as a workaround. Ideally the output dir
  // would be modified in-place, but that didn't work with
  // Rollup for some reason.
  input: ["output-tmp-rollup/**/*.tsx", "output-tmp-rollup/**/*.ts"],
  output: {
    dir: "output",
    format: "esm",  // Use ES6 modules
  },
  plugins: [
    multiInput({
      // Keeps the output directory structure same as input
      relative: "output-tmp-rollup/",
    }),
    nodeResolve(),  // Allow importing under node_modules
    commonjs(),     // Required for dependencies using commonjs
    json(),         // Allow import data from './data.json'
    typescript(),
  ],
};

Task runner

Rollup solves many complicated problems for us, but we also need a task runner to combine different tools together. There's no clear line between a module bundler and a task runner though. Grunt or Gulp used to be the de-facto task runners, but Webpack has taken over that territory too in many projects.

For this project though, I'm going to stick with my favorite approach: CLI tools and bash commands. For example HTML validation can be done with:

find ./output/ -name '*.html' html-validate {} +

The ultimate feature of the approach is that build commands can be copy-pasted to your terminal and easily debugged in steps. The commands tend to get very long, but I don't see that as a huge issue in reality. Long commands can be refactored into separate script files if needed and they focus on doing a single thing well. Bash commands usually operate with regular files, so that's the abstraction you'll mostly need to deal with.

It might get tricky if PowerShell or cmd.exe support is required, but there are tools that make it possible. I created concurrently and chokidar-cli to help with the cross-platform setup. There are also other great tools such as npm-run-all, dotenv, and onchange.

This is the stripped-down version of package.json scripts:

{
  // ... in package.json
  "scripts": {
    "build": "npm run render && npm run rollup",
    "rollup": "rollup -c rollup.config.js",
    "render": "ts-node src/generator/render.tsx",
    // ... many more, such as PostCSS, linter, etc
  }
}

Custom code & templates

We've chosen Rollup for module bundling, and npm run scripts to combine tools together. Next, we'll need custom code that iterates through all React pages and generates their static HTML counterparts. Many details such as rendering MDX pages will be omitted to stick with the topic of Part II.

Everything starts from src/generator/render.tsx. Rendering has many steps, but these are the relevant ones:

Find all entrypoints (i.e. the site pages)
Iterate through entrypoints and render the page templates
Save metadata for entrypoints in site-data.json

Let's break the steps down.

1. Find entrypoints

The entrypoints for React page components and their metadata are defined in src/pages/_exports.ts. Here's a simplified version of it:

import { getData as getIndexData, default as Index } from "./Index";
import { getData as getPostsData, default as Posts } from "./Posts";

export const pages = [
  {
    // Async function in case a page needs to
    // fetch external data.
    // Each page has to return title, path, description, etc.
    getData: getIndexData,
    Component: Index,
    // Referred in pageHydrate.tsx:
    // import PageComponent from "src/pages/{{{ fileName }}}";
    fileName: "Index",
  },
  {
    getData: getPostsData,
    Component: Posts,
    fileName: "Posts",
  },
];

The list contains all React pages, and which path to serve them from. React page means that the source is a regular React component contrary to e.g. .mdx blog post pages.

2. Iterate all pages

For each page in the entrypoints, do the following operations:

Call renderToString() to render the React page component as HTML.

const html = ReactDOMServer.renderToString(
  <page.Component pageData={pageData} siteData={siteData} />
);

Render page.html.template with page components's HTML and other metadata. Again, simplified:

<html>
  <head>
    <title>{{{ title }}}</title>
    <meta name="description" content="{{{ description }}}"/>
    <meta name="keywords" content="{{{ keywords }}}"/>

    <link defer rel="stylesheet" href="{{{ relativePathToRoot }}}styles.css"/>
  </head>
  <body>
    <div id="react-root">{{{ html }}}</div>
    <script type="module" src="{{{ hydrateScriptPath }}}"></script>
  </body>
</html>

Render the page-specific hydrate.tsx using pageHydrate.tsx.template. The templates are Mustache-like templates, but with triple braces to avoid conflicts with objectProp={{ key: "val" }} React props.

import React from "react";
import ReactDOM from "react-dom";
import siteData from "site-data.json";
// Will be replaced with e.g. "src/components/pages/Posts"
import PageComponent from "{{{ pageImportPath }}}";

// Will be replaced with e.g. "/posts/"
const pagePath = "{{{ pagePath }}}";
// Not the best data structure..
// good thing I don't have 10k blog posts
const page = siteData.pages.find((page) => page.data.path === pagePath);

window.addEventListener("load", () => {
  ReactDOM.hydrate(
    <PageComponent pageData={page.data} siteData={siteData} />,
    document.getElementById("react-root")
  );
});

The hydration script will be first saved as a TypeScript file to output-tmp-rollup/<page>/hydrate.tsx. The render script keeps the directory structure correct, so it matches the paths defined in pages' metadata. The output-tmp-rollup/**/*.tsx will finally be passed to Rollup, which outputs corresponding output/**/*.js files.

3. Save metadata to site-data.json

As seen in pageHydrate.tsx.template, site-wide data is imported from site-data.json. The data is used to for example render a list of all posts. The JSON data is saved in a single file, that can be cached and shared by all pages in the site.

import fs from "fs";
import { pages } from "src/pages/_exports";

const siteData = await mapAsync(pages, (page) => page.getData());
await fs.promises.writeFile(JSON.stringify(siteData), "site-data.json", {
  encoding: "utf-8"
});

Putting everything together

With these building blocks, we are able to convert React pages into static HTML counterparts. To recap, npm run build will:

Render all pages with src/generator/render.tsx
Run Rollup to transpile output-tmp-rollup/**/*.tsx files into output/**/*.js
Execute other build steps (copy static files, PostCSS, etc)

The build flow is simple in theory, but there are a huge amount of details that end up making the process complicated.