Skip to main content
Community Plugin
View plugin on GitHub
See starters using this

gatsby-transformer-mdx-introspection

This plugin statically analyzes the nodes generated by gatsby-plugin-mdx and generates GraphQL nodes that allow users to find out what MDX-sourced pages contain which manually placed React components or HTML elements that are automatically generated from Markdown.

It serves use cases that need the full “database” of “what component was placed where.” For example, use cases where there is a need to generate links to component locations could use this plugin.

It supports the full MDX syntax, using the compiled intermediate JSX code to generate the component tree from. Complex attributes are represented as strings, while simple attributes generally get parsed correctly, which turned out to be sufficient for the typical use case.

Install

npm install --save @commercetools-docs/gatsby-transformer-mdx-introspection

How to use

Just add the plugin to the plugins array in your gatsby-config.js

plugins: [`@commercetools-docs/gatsby-transformer-mdx-introspection`];

Example GraphQL query

To collect all locations of manually placed <ApiType .../> components:

query GetAllApiTypes {
  allComponentInMdx(filter: { component: { eq: "ApiType" } }) {
    nodes {
      component
      attributes {
        name
        value
      }
      mdx {
        file: parent {
          ... on File {
            relativePath
          }
        }
      }
    }
  }
}

Example response:

{
  "data": {
    "allComponentInMdx": {
      "nodes": [
        {
          "component": "ApiType",
          "attributes": [
            {
              "name": "apiKey",
              "value": "test"
            },
            {
              "name": "type",
              "value": "OutOfOrderPropertiesTestType"
            }
          ],
          "mdx": {
            "file": {
              "relativePath": "api/types/index.md"
            }
          }
        },
        ...
      ]
    }
  }
}

Options

tagWhitelist array<string | RegExp>

JSX components that generate Gatsby data nodes in the final output (other nodes still appear as children in tree and their children can generate nodes). To introspect all nodes for debugging purposes, a wildcard regular expression can be used as the only whitelist term ([/.*/]).

Note: most target use cases need to whitelist specific tags to inspect. This is the recommended approach to improve performance, but the whitelist configuration can be used as a blacklist, too by specifying a single regular expression with a negative look-around.

cleanWhitespace [boolean] (optional, defaults to true)

Whether to collapse/trim whitespace in JSX snippets and string literals

removeMdxCompilationArtifacts [boolean] (optional, defaults to true)

Whether to remove attributes that are (usually) artifacts of MDX compilation (mdxType and parentName)

shouldIndexNode [(node) ⇒ boolean] (optional, defaults to () ⇒ true)

Predicate function used as a performance escape hatch to filter MDX files that get parsed/indexed. Use if not all MDX files need to be indexed.

Advanced Queries

Each node exposes two fields, childrenComponentInMdx and childComponentInMdx, that both support filtering and sorting in addition to querying for deep children (descendents at any level). These fields allow for some advanced use cases:

query GetAllLinksInHeaders {
  allComponentInMdx(
    filter: { component: { in: ["h1", "h2", "h3", "h4", "h5", "h6"] } }
  ) {
    nodes {
      childrenComponentInMdx(
        filter: { component: { in: ["Link", "a"] } }
        deep: true
      ) {
        component
        content
        attributes {
          name
          value
        }
      }
    }
  }
}

This query gets all link elements (both Gatsby links and normal anchor HTML elements) that are descendents of headers. If deep were false, then the query would only get link elements that are direct children of headers.

Known issues

  • The plugin has to parse the MDX separately (and therefore twice in the site build) because gatsby-plugin-mdx does lazyly evaluate the abstract syntax tree (AST) property on the MDX GraphQL provided, which means it’s available to components using GraphQL but not to other plugins that read from the GatsbyJS Node Objects in earlier build phases.

    • In addition, the plugin has to parse all of the MDX upon transforming because it generates Gatsby data nodes from the components, so it can’t lazily parse the code like gatsby-plugin-mdx. Caching helps alleviate this problem, however.

Differences between MDX and output

The plugin relies on the compiled JSX created by @mdx-js/mdx from the MDX source code, so the final representation may contain slight differences compared to the original MDX.

  • Inline code blocks like `code` turn into inlineCode elements in the final component tree due to the MDX library
  • The MDX library adds certain attributes to each HTML or React element it parses, namely, mdxType and parentName. By default, the plugin automatically removes all attributes that match these names. However, since there is no easy way to determine if these attributes were present in the original MDX file, the plugin removes them too. This behavior can be turned off by setting removeMdxCompilationArtifacts to false in the plugin options
  • Whitespace may end up different in text nodes than it was in the original MDX. The plugin attempts to clean up the text nodes it finds, but this can sometimes produce undesired output. Both trimming and collapsing can be turned off by setting cleanWhitespace to false in the plugin options
  • Most complex javascript expressions are string-serialized in the final output, while simple literals (boolean/number/null/undefined/string) get parsed to their values
Docs
Tutorials
Plugins
Blog
Showcase