Skip to main content

Query Execution

Query Execution

Query Execution is kicked off by bootstrap by calling page-query-runner.js runInitialQuerys(). The main files involved in this step are:

Here’s an overview of how it all relates:

%0 cluster_other cluster_pageQueryRunner page-query-runner.js cluster_queryQueue query-queue.js cluster_queryRunner query-runner.js extractQueries query-watcher.js extractedQueryQ queueQueryForPathname() extractQueries->extractedQueryQ componentsDD componentDataDependencies (redux) findIdsWithoutDD findIdsWithoutDataDependencies() componentsDD->findIdsWithoutDD components components (redux) components->findIdsWithoutDD createNode CREATE_NODE action dirtyActions dirtyActions createNode->dirtyActions findDirtyActions findDirtyActions() dirtyActions->findDirtyActions queryJobs runQueriesForPathnames() extractedQueryQ->queryJobs findIdsWithoutDD->queryJobs findDirtyActions->queryJobs queryQ better-queue queryJobs->queryQ graphqlJs graphqlJs(schema, query, context, ...) queryQ->graphqlJs result Query Result graphqlJs->result diskResult /public/static/d/${dataPath} result->diskResult jsonDataPaths jsonDataPaths (redux) result->jsonDataPaths

Figuring out which queries need to be executed

The first thing this query does is figure out what queries even need to be run. You would think this would simply be a matter of running the Queries that were enqueued in Extract Queries, but matters are complicated by support for gatsby develop. Below is the logic for figuring out which queries need to be executed (code is in runQueries()).

Already queued queries

All queries queued after being extracted (from query-watcher.js).

Queries without node dependencies

All queries whose component path isn’t listed in componentDataDependencies. As a recap, in Schema Generation, we showed that all Type resolvers record a dependency between the page whose query we’re running and any nodes that were successfully resolved. So, If a component is declared in the components redux namespace (occurs during Page Creation), but is not contained in componentDataDependencies, then by definition, the query has not been run. Therefore we need to run it. Checkout Page -> Node Dependencies for more info. The code for this step is in findIdsWithoutDataDependencies.

Pages that depend on dirty nodes

In gatsby develop mode, every time a node is created, or is updated (e.g via editing a markdown file), we add that node to the enqueuedDirtyActions collection. When we execute our queries, we can lookup all nodes in this collection and map them to pages that depend on them (as described above). These pages’ queries must also be executed. In addition, this step also handles dirty connections (see Schema Connections). Connections depend on a node’s type. So if a node is dirty, we mark all connection nodes of that type dirty as well. The code for this step is in findDirtyIds. Note: dirty ids is really talking about dirty paths.

Queue Queries for Execution

We now have the list of all pages that need to be executed (linked to their Query information). Let’s queue them for execution (for realz this time). A call to runQueriesForPathnames kicks off this step. For each page or static query, we create a Query Job that looks something like:

{
  id: // page path, or static query hash
  hash: // only for static queries
  jsonName: // jsonName of static query or page
  query: // raw query text
  componentPath: // path to file where query is declared
  isPage: // true if not static query
  context: {
    path: // if staticQuery, is jsonName of component
    ...page // page object. Not for static queries
    ...page.context // not for static queries
  }
}

This Query Job contains everything we need to execute the query (and do things like recording dependencies between pages and nodes). So, we push it onto the queue in query-queue.js and then wait for the queue to empty. Let’s see how query-queue works.

Query Queue Execution

query-queue.js creates a better-queue queue that offers advanced features like parallel execution, which is handy since queries do not depend on each other so we can take advantage of this. Every time an item is consumed from the queue, we call query-runner.js where we finally actually execute the query!

Query execution involves calling the graphql-js library with 3 pieces of information:

  1. The Gatsby schema that was inferred during Schema Generation.
  2. The raw query text. Obtained from the Query Job.
  3. The Context, also from the Query Job. Has the page’s path amongst other things so that we can record Page -> Node Dependencies.

Graphql-js will parse the query, and executes the top level query. E.g allMarkdownRemark( limit: 10 ) or file( relativePath: { eq: "blog/my-blog.md" } ). These will invoke the resolvers defined in Schema Connections or GQL Type, which both use sift to query over all nodes of the type in redux. The result will be passed through the inner part of the graphql query where each type’s resolver will be invoked. The vast majority of these will be identity functions that just return the field value. Some however could call a custom plugin field resolver. These in turn might perform side effects such as generating images. This is why the query execution phase of bootstrap often takes the longest.

Finally, a result is returned.

Save Query results to redux and disk

As queries are consumed from the queue and executed, their results are saved to redux and disk for consumption later on. This involves converting the result to pure JSON, and then saving it to its dataPath. Which is relative to public/static/d. The data path includes the jsonName and hash. E.g: for the page /blog/2018-07-17-announcing-gatsby-preview/, the queries results would be saved to disk as something like:

/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json

For static queries, instead of using the page’s jsonName, we just use a hash of the query.

Now we need to store the association of the page -> the query result in redux so we can recall it later. This is accomplished via the json-data-paths reducer which we invoke by creating a SET_JSON_DATA_PATH action with the page’s jsonName and the saved dataPath.


Was this helpful? edit this page on GitHub