Create JSON-LD Structured Data in Jekyll

In this post, I will explain how to create JSON-LD structured data for Jekyll blog.

Overview

Google uses structured data that it finds on the web to understand the content of the page, as well as to gather information about the web and the world in general. In this post, I will explain how I create JSON-LD structured data for my blog, powered by Jekyll.

Include JSON-LD in Post Generation

In order to include JSON-LD structured data in the blog post, you need to find out which HTML template(s) generate such page for you. In my case, the HTML is handled by:

/_layouts/post.html

So I need to edit the page post.html by adding a new HTML element <script>:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "url": "{{ site.url }}{{ page.url }}",
  "name": {{ page.title | jsonify }},
  "headline": {{ page.title | jsonify }},
  "keywords": {{ page.tags | join: ',' | jsonify }},
  "description": {{ page.excerpt | strip_newlines | strip | jsonify }},
  "articleBody": {{ page.content | strip_html | jsonify }},
  "datePublished": {{ page.date | jsonify }},
  "dateModified": {{ page.last_modified_at | default: page.date | jsonify }},
  "author": {
    "@type": "Person",
    "name": {{ site.author_name | jsonify }},
    "givenName": {{ site.author_first_name | jsonify }},
    "familyName": {{ site.author_last_name | jsonify }},
    "email": {{ site.email | jsonify }}
  },
  "publisher": {
    "@type": "Organization",
    "name": {{ site.title | jsonify }},
    "url": "{{ site.url }}",
    "logo": {
      "@type": "ImageObject",
      "width": 32,
      "height": 32,
      "url": "{{ site.url }}/icon/favicon.ico"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "{{ site.url }}{{ page.url }}"
  },
  "image": {
    "@type": "ImageObject",
    "width": {{ page.img_width | default: site.img_width }},
    "height": {{ page.img_height | default: site.img_height }},
    "url": "{{ site.url }}{{ page.img_url | default: site.img_url }}"
  }
}
</script>

Once you’ve added the code above, the generated JSON-LD structured data snippet should be included in your blog post as follows (simplified):

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "url": "https://mincong-h.github.io/2018/08/21/why-you-should-use-auto-value-in-java/",
  "name": "Why You Should Use Auto Value in Java?",
  "headline": "Why You Should Use Auto Value in Java?",
  "keywords": "java,auto-value",
  "description": "Auto Value generates immutable value classes during Java compilation, including equals(), hashCode(), toString(). It lighten your load from writing these boilerplate source code.",
  "datePublished": "2018-08-21 07:22:49 +0000",
  "dateModified": "2018-08-21 07:22:49 +0000",
  "author": {
    "@type": "Person",
    "name": "Mincong Huang",
    "givenName": "Mincong",
    "familyName": "Huang",
    "email": "mincong.h@gmail.com"
  }
}
</script>

Quite easy, right? In the following sections, we’ll see how to choose the right schema and test the generated data. If you’ve questions about Jekyll expressions or Liquid expressions, I’ll explain them at the end of this post, at section “Advanced Configuration”.

Which Schema Should I Use?

I use BlogPosting for my blog posts. Some websites use NewsArticle, such as Medium. I’m not really sure which is the best choice, but I believe we should use Article, or any schema derived from it.

https://schema.org is a website, useful for choosing the right schema. Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. A schema can be found in the following URL pattern:

https://schema.org/${mySchema}

https://webmasters.stackexchange.com is another website, useful for choosing the right schema. There’re many questions and answers about schemas, or about web-masters in general. Typically, there’s a discussion about Using Schema.org for blogging: Article VS BlogPosting.

Test Your Structured Data

⚠️ DEPRECATION: Structure Data Testing Tool (https://search.google.com/structured-data/testing-tool/) will be replaced by Rich Results Test (https://search.google.com/test/rich-results). – edited on 3 March, 2021.

Google Structured Data Testing Tool is an easy and useful tool for validating your structured data, and in some cases, previewing a feature in Google Search. Try it out:

Google Structured Data Testing Tool

You can either provide an URL or directly the HTML source code. In my opinion, submitting HTML source code is a good choice, which allows validating the previewed version (localhost) before pushing the changes into production.

Google Structured Data Testing Tool is also useful for knowing which fields are required by the target schema. My technique is to submit an empty schema to the testing tool, where only the schema name is filled, then let Google tell you which fields is missing.

Note: image URL will always fail when submitting your blog post generated in localhost, because Google does not recognize the image URL. But it does not matter, this problem will be fixed once the changes are pushed to your production.

Advanced Configuration

Now, let’s talk about the Jekyll and Liquid expressions used in JSON-LD. If the above code fits your needs, you can skip this section.

Use “jsonify” to convert data to JSON. You can apply a jsonify filter to a string or an array to create a valid JSON value. Note that the generated output already contains the double-quotes ("), do not add them again yourself.

{{ your.property | jsonify }}

For example, the input message:

Line 1
Line 2

will be converted as output (with double quotes):

"Line 1\nLine2"

Use “default” to provide fall-back value. Liquid’s default filter allows you to provide a default value for your blog post. It’s useful for optional properties like image URL, or modification date. They might not present in your post: therefore, fall-back to default image URL (image of your blog) or creation date.

"{{ site.url }}{{ page.img_url | default: site.img_url }}"

Limits of JSON-LD

Even though JSON-LD might be the best format for structured data format, not every actors recognize it. Social networks, like LinkedIn, does not read the schemas for their metadata creation. So using only JSON-LD snippet is not enough, you still need to have some HTML meta tags.

Next Steps

So far, we added the JSON-LD snippet into blog posts generation. Is it good enough? No, not yet. You still need to:

  1. Test the real blog URL in Google Structured Data Testing Tool to ensure everything works well, including the images.
  2. Follow the structured data discovery on Google Search Console to ensure that Google Search really discovered them. This process might take a few days.
  3. Continuously improve your JSON-LD snippet to ensure they’re relevant to the content and match user’s expectations (their search queries). For example, you can edit the content, provide more optional fields, or add new schemas.

Conclusion

In this post, we learnt how to create JSON-LD (JSON for Linking Data) for Jekyll, the choice of schemas, testing the result, the limits of JSON-LD and the remaining tasks to do once JSON-LD is embedded. By doing this, there’s chance that your blog posts will rank better and have more views in the next months. Hope you enjoy this article, see you next time!

Update

Sunday 7 March, 2021:

Managing the JSON-LD structured data yourself works, but it has some downside. After using for 2 years and half, I decided to stop doing this because:

  • When migrating to a new Jekyll theme, you will have to know how the layouts are organized in that theme and migrate your code.
  • You have to validate the implementation yourself.
  • The implementation can be error-prone, e.g. using the incorrect liquid filters.
  • It does not take into account other SEO optimization tricks.
  • You may not want to spend time on this topic because you want to save time focus on the actual content of the article.

So what is the better alernative? I believe using Jekyll plugin jekyll-seo-tag is much better. It adds the following meta tags to your site:

  • Page title, with site title or description appended
  • Page description
  • Canonical URL
  • Next and previous URLs on paginated pages
  • JSON-LD Site and post metadata for richer indexing
  • Open Graph title, description, site title, and URL (for Facebook, LinkedIn, etc.)
  • Twitter Summary Card metadata

Perform the installation in 3 steps:

  1. Add the following to your site’s Gemfile:

    gem 'jekyll-seo-tag'
    
  2. Add the following to your site’s _config.yml:

    plugins:
      - jekyll-seo-tag
    

    If you are using a Jekyll version less than 3.5.0, use the gems key instead of plugins.

  3. Add the following right before </head> in your site’s template(s):

    {% seo %}
    

See more details in https://github.com/jekyll/jekyll-seo-tag.

References