Skip to main content

Amberley Romo

Developer based in Austin, TX.

9 min read · October 25th 2018

Using unstructured data in Gatsby

When creating web experiences, an inevitable question is, “how do I get my data from point A (the source) to point B (the component)?“. This can end up being a deceptively complex question.

Gatsby’s rich data plugin ecosystem lets you build sites with the data you want — from one or many sources. You can pull data from headless CMSs, SaaS services, APIs, databases, your file system & more directly into your components.

An assortment of possible data sources (CMSs, Markdown, APIs, etc)
Your data could come from anywhere

Most examples in the Gatsby docs and on the web at large focus on leveraging source plugins to manage your data in Gatsby sites. And rightly so! Gatsby’s data layer is powerful and extremely effective; it solves the “integration problem” of decoupled CMSs — it’s the glue between presentation layer and wherever your data is sourced from.

integration layer
Gatsby's GraphQL integration layer is the glue between presentation layer and where your data lives

Source plugins “source” data from remote or local locations into Gatsby nodes. Gatsby nodes are the center of Gatsby’s data handling layer.

We’re calling this the “content mesh” — the infrastructure layer for a decoupled website. (Sam Bhagwat introduced and explored this concept in his recent five-part series, The Journey to a Content Mesh).

However, you don’t need to use source plugins (or create Gatsby nodes) to pull data into a Gatsby site! In this post we’ll explore how to use an “unstructured data” approach in Gatsby sites, and some of the pros and cons of doing so.

Note: For our purposes here, “unstructured data” means data “handled outside of Gatsby’s data layer” i.e. using the data directly, and not transforming the data into Gatsby nodes.

An example of creating pages using unstructured data from a remote API

We’ll take a look at a (very serious) example of how this works. In the example, we’ll:

  1. Load data from the PokéAPI’s REST endpoints
  2. Create pages (and nested pages) from this data

That’s it!

The tldr; (in tweet form)

Breaking down the example

Note: This walkthrough assumes you have working knowledge of Gatsby fundamentals. If you’re not (yet!) familiar with Gatsby, you may want to take a look at our Quick Start doc first.

1. Use Gatsby’s createPages API.

createPages is a Gatsby Node API. It hooks into a certain point in Gatsby’s bootstrap sequence.

By exporting createPages from our example Gatsby site’s gatsby-node.js file, we’re saying, “at this point in the bootstrapping sequence, run this code”.

gatsby-node.js
exports.createPages = () => {  // Run this code
}

2. Fetch the data from the PokéAPI.

gatsby-node.js
exports.createPages = async () => {
  const allPokemon = await getPokemonData(["pikachu", "charizard", "squirtle"])}

Note: getPokemonData is an async function which fetches the relevant desired data for all of our Pokémon.

3. Grab the createPage action

When you hook into a Gatsby API (like createPages from step one), you are passed a collection of actions. In this example, we’re extracting the createPage action using ES6 object destructuring:

gatsby-node.js
exports.createPages = async ({ actions: { createPage } }) => {  const allPokemon = await getPokemonData(["pikachu", "charizard", "squirtle"])
}

4. Create a page that lists all Pokémon.

gatsby-node.js
exports.createPages = async ({ actions: { createPage } }) => {
  const allPokemon = await getPokemonData(["pikachu", "charizard", "squirtle"])

  // Create a page that lists all Pokémon.  createPage({    path: `/`,    component: require.resolve("./src/templates/all-pokemon.js"),    context: { allPokemon },  })}

The createPage action is passed an object containing:

  • path: This is the relative url you’d like your new page will be available at.
  • component: This is the absolute path to the React component you’ve defined for this page.
  • context: Context data for this page. Available either as props to the component (this.props.pageContext) or as graphql arguments.

In our example, we’re accessing the context as props to the component. This allows us to completely circumvent Gatsby’s data layer; it’s just props.

src/templates/all-pokemon.js
export default ({ pageContext: { allPokemon } }) => (    {...}
        {allPokemon.map(pokemon => (            <li
                key={pokemon.id}                style={{
                    textAlign: 'center',
                    listStyle: 'none',
                    display: 'inline-block'
                }}
            >
                <Link to={`/pokemon/${pokemon.name}`}>                    <img src={pokemon.sprites.front_default} alt={pokemon.name} />                    <p>{pokemon.name}</p>                </Link>
            </li>
        ))}
    {...}
);

5. Create a page for each Pokémon.

gatsby-node.js
exports.createPages = async ({ actions: { createPage } }) => {
  const allPokemon = await getPokemonData(["pikachu", "charizard", "squirtle"])

  // Create a page that lists all Pokémon.
  createPage({
    path: `/`,
    component: require.resolve("./src/templates/all-pokemon.js"),
    context: { allPokemon },
  })

  // Create a page for each Pokémon.  allPokemon.forEach(pokemon => {    createPage({      path: `/pokemon/${pokemon.name}/`,      component: require.resolve("./src/templates/pokemon.js"),      context: { pokemon },    })  })}

6. Create a page for each ability of each Pokémon.

gatsby-node.js
exports.createPages = async ({ actions: { createPage } }) => {
  const allPokemon = await getPokemonData(["pikachu", "charizard", "squirtle"])

  // Create a page that lists all Pokémon.
  createPage({
    path: `/`,
    component: require.resolve("./src/templates/all-pokemon.js"),
    context: { allPokemon },
  })

  // Create a page for each Pokémon.
  allPokemon.forEach(pokemon => {
    createPage({
      path: `/pokemon/${pokemon.name}/`,
      component: require.resolve("./src/templates/pokemon.js"),
      context: { pokemon },
    })

    // Create a page for each ability of the current Pokémon.    pokemon.abilities.forEach(ability => {      createPage({        path: `/pokemon/${pokemon.name}/ability/${ability.name}/`,        component: require.resolve("./src/templates/ability.js"),        context: { pokemon, ability },      })    })  })
}

For each type of page, we are invoking the createPage action, and supplying it with our desired path, React component, and data (as context).

View the full source code of this example at Jason Lengstorf’s “gatsby-with-unstructured-data” repo. Also check out the “using-gatsby-data-layer” branch of that repo, to compare a refactor that uses Gatsby’s data layer in the same example.

The pros of using unstructured data

  • When prototyping, or when new to Gatsby, this approach may feel more familiar, comfortable, and faster
  • There’s no intermediate step: you fetch some data, then build pages with it

The tradeoffs of foregoing Gatsby’s data layer

Using Gatsby’s data layer provides the following benefits:

  • Enables you to declaratively specify what data a page component needs, alongside the page component
  • Eliminates frontend data boilerplate — no need to worry about requesting & waiting for data. Just ask for the data you need with a GraphQL query and it’ll show up when you need it
  • Pushes frontend complexity into queries — many data transformations can be done at build-time within your GraphQL queries (e.g. Markdown -> HTML, images -> responsive images, etc)
  • It’s the perfect data querying language for the often complex/nested data dependencies of modern applications
  • Improves performance by removing data bloat — GraphQL enables you to select only the data you need, not whatever an API returns
  • Enables you to take advantage of hot reloading when developing; For example, in this post’s example “Pokémon” site, if you wanted to add a “see other pokémon” section to the pokémon detail view, you would need to change your gatsby-node.js to pass all pokémon to to the page, and restart the dev server. In contrast, when using queries, you can add a query and it will hot reload.

Learn more about GraphQL in Gatsby.

Working outside of the data layer also means foregoing the optimizations provided by transformer plugins, like:

  • gatsby-image (speedy optimized images),
  • gatsby-transformer-sharp (provides queryable fields for processing your images in a variety of ways including resizing, cropping, and creating responsive images),
  • … the whole Gatsby ecosystem of official and community-created transformer plugins.

Another difficulty added when working with unstructured data is that your data fetching code becomes increasingly hairy when you source directly from multiple locations.

Thanks

  • Thank you to Tanner Linsley of react-static, who helped us realize that directly querying APIs and passing them into pages is a great way to build smaller sites, and came up with the term “unstructured data”.
Tagged with sourcing | data

Enjoyed this post? Receive the next one in your inbox!