Gatsby – Better Last Updated or Modified Dates for Posts

  • report
    Disclaimer
    Click for Disclaimer
    This Post is over a year old (first published about 5 years ago). As such, please keep in mind that some of the information may no longer be accurate, best practice, or a reflection of how I would approach the same thing today.
  • infoFull Post Details
    info_outlineClick for Full Post Details
    Date Posted:
    Sep. 11, 2019
    Last Updated:
    Sep. 20, 2019
  • classTags
    classClick for Tags

A common need for documentation or blog posts that cover topics that rapidly change (such as technology) is a “last modified” or “last updated” date and time. It can quickly tell readers how up-to-date your post is and clue them in on how relevant it might be to the information they seek.

A nasty surprise that I got when I first deployed Gatsby was how it handled file modification stamps. I had setup Gatsby to use a commonly used system for pulling in file timestamps; in GraphQL, I queried file -> modifiedTime or file -> mtime, which is when the markdown file was last modified on the disk, and then in in my page template, simply used something like Markdown last updated: {(new Date(node.parent.modifiedTime)).toString()}. When building and testing locally, everything worked beautifully. But when I deployed to a different location (my actual production server), instead of showing  the true time the file was modified, every page showed the same modification date: when the deploy ran, today!

I had forgotten a very simple lesson that I had previously learned; git does not record or keep file timestamp metadata. So unless you are syncing files directly from your computer to your server, they are not going to have matching timestamps. In my scenario, my server (Netlify) did a git clone, which set all the modified stamps to the current time, and subsequent deploys after git push or git merge also had incorrect dates. This is not limited to Netlify though, since again, the issue is that git simply does not store this data. In fact, you will lose correct edit timestamps if you delete your project and re-clone from a remote source (Github).

Solution A: Storing Dates in Front-Matter or Filenames

So how do we get around this? Many people opt to use Gatsby’s support for “front matter” – these are fields that you place at the top of a source Markdown file, and contain metadata about the file that can feed into GraphQL without necessarily being rendered visible to the user. This is basically the official recommendation from Gatsby and Netlify (see Github Issue and Stack Overflow). Here is what that front matter might look like at the top of a markdown file:

---
date: 2019-08-24
last-modified: 2019-08-26
author: 'Liz Lemon'
title: 'Favorite Snacks'
tags:
 - food
 - list
 - favs
---

How to even begin with my favorite snacks! Well, my top 20 are...

People also often store the creation date within the filename itself (like “2019-08-24–my-blog-post.md”),  or use that in combination with storing dates in front-matter. If you look at some very popular Gatsby repos on Github (such as this, this, or this), either of these solutions are extremely popular.

Now, I’m about to say something that might be controversial and rather blunt; I think this approach has a bad “code smell”.

To start with, you’re introducing a non-automated hand-written process into the system; authors have to write out dates by hand as a text string and remember to keep them updated if they edit a file. Furthermore, how are you parsing this string? If you are converting it into a JS Date, you better have error checking in case someone miss-typed a date or suddenly wrote in the wrong syntax! And what about timezones? Do you want to store just the date, or do you want the actual hour modified in there too? If so, that is an extra thing the author has to manually type out!

Solution A – part 2

It should be noted that, hybrid CMS approaches (aka a “headless CMS” that supports MD) , such as Netlify CMS, do automate the part of generating this frontmatter and saving it in Markdown. However, this is a bit of a moot point if you accept pull-requests from users who are not using a CMS, or hand-edit markdown in addition to using the CMS. Also, this still leaves my second complaint, which is more of a rant:

The second part of my complaint is partly directed at a tendency to overuse Markdown and SSRs to the point of misuse. Storing separate and accurate post publication and last modified stamps is something that is built-in to pretty much every major CMS that uses a database; it’s just another column in the SQL table. Requiring that authors hand type timestamps or using hacked together code to paste date as text strings into files is going backwards, and is a hint that maybe relying 100% on markdown to generate a site is not always the best choice. It irks me a bit though that it is so trendy to hate on databases and shove everything in static files, when the truth is that it takes a lot of kludgy fixes to just replicate some simple things that come standard and are more “safe” with a database based CMS. /rant

An alternative automated approach: Using git commit timestamps

Without using a headless-CMS layer as an editor for our markdown, there is an approach that should work locally, as well as when deployed, since it will check the timestamps into source control. The downside of this approach is that the initial setup is a little complicated, since it relies on combining several advanced tools: git hooks, git formatting (aka “pretty-print”), bash scripting, and using Gatsby’s Node-APIs.

The way I’m currently setting this up is to use a JSON file to hold all the timestamps; the filepath relative to the project root is used as the lookup key, and the value is an object that holds both a last modified and created timestamp, in UNIX timestamp format.

The pseudo code / flow looks like this:

  • A git pre-commit hook fires when a dev uses git commit, but before the commit actually finalizes
  • The git hook retrieves the filenames of the files that were changed in the commit, and passes them to a node script, via arguments
  • The node script then does the bulk of the work:
    • A “created” timestamp is generated once per file by looking up the file history with git log
      • A very handy one-liner command to do this is git log --pretty=format:%at --follow -- myFile.txt | tail -n 1
    • The modified stamp is pulled in with stat and mtimeMs, which gives the modification time in a UNIX timestamp
    • Timestamps are updated in the JSON file, and the file is added to the current commit before it finalizes
  • After the git hook completes, the updated timestamp JSON file will be lumped in with everything else in the commit
  • To pull the timestamps from the JSON file into Gatsby GraphQL, gatsby-node.js does the heavy lifting:
    • Within exports.onCreateNode, my code checks to see if a timestamp exists for a given file node, and if it does, it uses it with createNodeField to add it to the node
    • Now within exports.createPages and within React page templates, I can pull in the newly created node fields with GraphQL queries.

I’m still working on cleaning up this code a bit; once I’m happy with it, I might release it as a NPM package, since I could see it being of a lot of use to many projects, beyond just Gatsby.

The node script which pulls the modification and creation times in via git scraping is now published as an NPM package – check it out here.

So far, it’s working great! Check it how it automatically updated my timestamps in this commit, based on the files updated:

Gatsby - Automated Markdown Timestamps per Git Commit

 

One thought on “Gatsby – Better Last Updated or Modified Dates for Posts”

  1. Ramesh Janjyam says:

    thank you Joshua. I am exactly trying to solve this problem. thanks for writing it up. I will give it a try

Leave a Reply

Your email address will not be published.