Creating a Source Plugin
Source plugins are essentially out of the box integrations between Gatsby and various third-party systems.
These systems can be CMSs like Contentful or WordPress, other cloud services like Lever and Strava, or your local filesystem — literally anything that has an API. Currently, Gatsby has over 300 source plugins.
Once a source plugin brings data into Gatsby’s system, it can be transformed further with transformer plugins.
At a high-level, a source plugin:
- Ensures local data is synced with its source and is 100% accurate.
- Creates nodes with accurate media types, human-readable types, and accurate contentDigests.
- Links nodes & creates relationships between them.
- Lets Gatsby know when nodes are finished sourcing so it can move on to processing them.
A source plugin is a regular NPM package. It has a
package.json file with optional
dependencies as well as a
gatsby-node.js file where you implement Gatsby’s Node
APIs. Read more about Files Gatsby Looks for in a Plugin.
gatsby-node.js should look something like:
Each node created by the filesystem source plugin includes the raw content of the file and its media type.
A media type (also MIME type
and content type) is an official way to identify the format of
files/content that is transmitted on the internet, e.g. over HTTP or through
email. You might be familiar with other media types such as
Each source plugin is responsible for setting the media type for the nodes they create. This way, source and transformer plugins can work together easily.
This is not a required field — if it’s not provided, Gatsby will infer the type from data that is sent — but it’s the way for source plugins to indicate to
transformers that there is “raw” data that can still be further processed. It
also allows plugins to remain small and focused. Source plugins don’t have to have
opinions on how to transform their data: they can set the
push that responsibility to transformer plugins, instead.
For example, it’s common for services to allow you to add content in
Markdown format. If you pull that Markdown into Gatsby and create a new node, what
then? How would a user of your source plugin convert that Markdown into HTML
they can use in their site? You would create a
node for the Markdown content and set its
text/markdown and the
various Gatsby Markdown transformer plugins would see your node and transform it
This loose coupling between the data source and the transformer plugins allow Gatsby site builders to assemble complex data transformation pipelines with little work on their (and your (the source plugin author)) part.
a community-made NPM package, can help when writing source plugins. This
package provides a set of helper functions to generate Node objects with the
required fields. This includes automatically generating fields like node IDs
contentDigest MD5 hash, keeping your code focused on data gathering,
After your plugin is finished sourcing nodes, it should either return a Promise or use the callback (3rd parameter) to report back to Gatsby when
sourceNodes is fully executed. If a Promise or callback isn’t returned, Gatsby will continue on in the build process, before nodes are finished being created. Without the necessary return statement your nodes might not end up in the generated schema at compilation time, or the process will hang while waiting for an indication that it’s finished.
Gatsby source plugins not only create nodes, they also create relationships between nodes that are exposed to GraphQL queries.
There are two ways of adding node relationships in Gatsby: (1) transformations (parent-child) or (2) foreign-key based.
An example of a transformation relationship is the
gatsby-transformer-remark plugin, which transforms a parent
File node’s markdown string into a
MarkdownRemark node. The Remark transformer plugin adds its newly created child node as a child of the parent node using the action
createParentChildLink. Transformation relationships are used when a new node is completely derived from a single parent node. E.g. the markdown node is derived from the parent
File node and wouldn’t ever exist if the parent
File node hadn’t been created.
Because all children nodes are derived from their parent, when a parent node is deleted or changed, Gatsby deletes all of the child nodes (and their child nodes, and so on) with the expectation that they’ll be recreated again by transformer plugins. This is done to ensure there are no nodes left over that were derived from older versions of data but shouldn’t exist any longer.
Creating the transformation relationship
In order to create a parent/child relationship, when calling
createNode for the child node, the new node object that is passed in should have a
parent key with the value set to the parent node’s
id. After this, call the
createParentChildLink function exported inside
Here’s the above example from the
gatsby-transformer-remark source plugin.
Here’s another example from the
gatsby-transformer-sharp source plugin.
An example of a foreign-key relationship would be a Post that has an Author.
In this relationship, each object is a distinct entity that exists whether or not the other does, with independent schemas, and field(s) on each entity that reference the other entity — in this case the Post would have an Author, and the Author might have Posts. The API of a service that allows complex object modelling, for example a CMS, will often allow users to add relationships between entities and expose them through the API.
When an object node is deleted, Gatsby does not delete any referenced entities. When using foreign-key references, it’s a source plugin’s responsibility to clean up any dangling entity references.
Suppose you want to create a relationship between Posts and Authors, and you want to call the field
Before you pass the Post object and Author object into
createNode and create the respective nodes, you need to create a field called
author___NODE on the Post object to hold the relationship to Authors. The value of this field should be the node ID of the Author.
It’s often convenient for querying to add to the schema backwards references. For example, you might want to query the Author of a Post but you might also want to query all the posts an author has written.
If you want to call this field on
posts, you would create a field called
posts___NODE to hold the relationship to Posts. The value of this field should be an array of Post IDs.
Here’s an example from the WordPress source plugin.
When creating fields linking to an array of nodes, if the array of IDs are all of the same type, the relationship field that is created will be of this type. If the linked nodes are of different types; the field will turn into a union type of all types that are linked. See the GraphQL documentation on how to query union types.
See Node Link in the API Specification concepts section for more info.
One tip to improve the development experience of using a plugin is to reduce the time it takes to sync between Gatsby and the data source. There are two approaches for doing this:
- Add event-based sync. Some data sources keep event logs and are able to return a list of objects modified since a given time. If you’re building a source plugin, you can store
the last time you fetched data using
setPluginStatusand then only sync down nodes that have been modified since that time. gatsby-source-contentful is an example of a source plugin that does this.
- Proactively fetch updates. One challenge when developing locally is that a developer might make modifications in a remote data source, like a CMS, and then want to see how it looks in the local environment. Typically they will have to restart the
gatsby developserver to see changes. This can be avoided if your source plugin knows to proactively fetch updates from the remote server. For example,gatsby-source-sanity, listens to changes to Sanity content when
watchModeis enabled and pulls them into the Gatsby develop server.
Edit this page on GitHub