Shell Heredocs - for Fun and Profit, in Bash and Beyond - function _undefined(params) { return core.

Intro

One of features of shell scripting (e.g. with bash or zsh) that I feel is undervalued is heredocs (aka Here Documents). At their most simple level, heredocs are a way to embed and escape a multi-line block of text within shell code:

cat << "EOF"
This text is inside a heredoc!

I can put pretty much anything I want here, including whitespace and special characters:
  -> $
    -> 🎉
EOF

🔗 Docs: GNU Bash Reference - Here Documents

But wait - that’s not all! - bash heredocs can be so much more! And I’m here to show you why (and how).

💡 A lot of these tips about piping and redirection apply to shell strings in general; it is just that heredocs are the easiest to use for escaping and multi-line purposes.

🤩 On a personal level, there is something that really appeals to me about heredocs, both in shell scripting, and generally across all programming languages. There is something that feels really pleasant about the ability to write self-contained modules that can inline blocks of whatever you’d like, as opposed to having to break everything out into a separate file / component. Of course, you can overdo this too…

The Basics of heredocs

First, a quick lesson in heredocs (or a refresher, if you are already familiar). If you want to, you can also skip ahead to the use-cases / examples.

The general syntax for heredocs in shell scripts is:

[COMMAND] << "DELIMITER"
HEREDOC_CONTENTS
DELIMITER

or

[COMMAND] <<- "DELIMITER"
HEREDOC_CONTENTS
DELIMITER

Breaking this down:

The command is optional
- You often see cat being used, if you simply need to pipe to stdout
Double left redirect operators (<<) are always required
The hyphen / minus (-) after the redirect operators is optional; if you include it, it will allow for and ignore leading whitespace for the HEREDOC_CONTENTS
The DELIMITER string can be anything, but the most common convention is to use the string EOF (end-of-file)
The double (or single) quotes around the delimiter is optional; if used, it will prevent interpolation inside the heredoc
- For example, if you want to print the literal string $PWD without having it evaluated, you would want to quote the delimiter
Finally, the ending delimiter has to match the starting one, should never be quoted (regardless of the starting delimiter), and should never be indented

You can redirect and pipe around Heredocs just like other strings, which makes them very powerful. For example, you can mix interpolation, command substitution, and stdout redirection, all in one go:

cat > tee system-info.txt << EOF

Directory: $PWD
OS: $(uname -a)
EOF

And this is just scratching the surface; read on to see some fun and productive ways to use (or abuse) heredocs:

Fun and Productive Use Cases for Heredocs

Using heredocs to escape and store special text

One of the most common uses for heredocs, although not the most exciting (IMHO), is to use them to escape large messy text strings. Optionally also capturing as a variable.

For example, we might want to produce a string that contains a dollar sign and line breaks, without these causing issues with the shell interpreter / shell expansion. We can do this with a single heredoc, by making sure we quote the leading delimiter:

# Notice that we are quoting EOF to
# prevent expansion / interpretation of contents
receipt=$(cat << "EOF"
Item A: $20
Item B: $40
-----------
Total: $60

Have a great day!
EOF
)
echo "$receipt"

Using heredocs to store and execute non-shell code

This is probably my favorite use of heredocs, and the one that boosts my productivity the most.

Imagine you are working on a project with mixed programming languages (Python + JavaScript with Nodejs, for example) and you have a set of little utility functions that you like to run, but they are personal enough to your workflow that you don’t want to commit them to the shared codebase.

One option is to write your own CLI file, or set of CLI files, and .gitignore them. However, with mixed programming languages in the same project, you might end up with something like do_thing.py, do_other_thing.py, do_js_thing.js, and so on. This might be the approach that most people take, but I find it cumbersome.

However, another option is to inline them - as many different languages and utilities as you’d like - in a shell script:

python << "EOF"
from my_lib import do_thing
print("hello from Python!")
do_thing()
EOF

On the surface, this might not seem that useful, but trust me, this can come in very handy. Especially when you combine it with task runners, like my favorite (task). You can then execute multiple scripts, in multiple different programming languages, all from a single file / CLI ✨:

tasks:
  # Standard shell scripting
  clean: rm -rf ./logs/*
  test: npm run test

  # Python heredoc
  python_util: |
    python << "EOF"
    from my_lib import do_thing
    print("hello from Python!")
    do_thing()
    EOF

  # More advanced python heredoc, with variable from shell
  # NOTE: `{{.CLI_ARGS}}` is specific to `task` - outside of this file, you
  # would usually just use `$1`
  manage:delete_user_by_last_name: |
    cat << "EOF" | xargs -0 python3 my_backend/manage.py shell --command
    from my_app import models
    print("🗑 Deleting user with last name of {{.CLI_ARGS}}")
    models.User.objects.filter(last_name="{{.CLI_ARGS}}").delete()
    EOF

    echo "✅ User deleted!"

  # NodeJS heredoc
  node_util: |
    node << "EOF"
    const { myNodeUtil } = require('./src/utils.js');
    myNodeUtil();
    EOF

Now I can run task python_util to run my python command, task node_util to run my node script, and so on. Single CLI, single file to manage, and as many utilities as my yak-shaving heart desires.

The downside to this approach is that IDE syntax highlighting of code within the heredocs is generally not very good.

Taking Advantage of More Libraries From Your Terminal

This is basically an extension of the above section on mixing other scripting languages into shell scripts - the idea that you can use this approach to pull in more standard libraries (and 3rd party libraries) into your terminal, more than just the regular UNIX commands you might be used to.

Look, I like standard shell commands, the UNIX philosophy, and I get the importance of being comfortable with shell scripting, but at the same time, I think there is no denying that some of the core utilities might not have the best ergonomics. For example, I would argue that both NodeJS’s and Python’s standard library functions for dealing with regular expressions beat grep and sed by a large margin.

With heredocs, it becomes much easier to pull in various standard library functions from different scripting languages directly into your shell.

For example, if we have a JSON file, test.json that matches this format:

{
  "a": {
    "b": {
      "c": [1,2,3]
    }
  }
}

This is all it takes to get the length of the nested array with Node:

JSON_STRING="$(cat test.json)" node << "EOF"
console.log(JSON.parse(process.env.JSON_STRING).a.b.c.length);
EOF

Even if you disagree with me on the usability of some of these commands, I think it is worth considering the following:

Scripting standard libraries are often more portable than shell commands. Many developers have at least Python and NodeJS installed, and both of these languages have stable standard libraries that largely operate the same across OSes and are backwards compatible. For comparison, lots of shell commands have identical names across platforms but act completely different on Mac vs Linux.
- To put it simply, not everyone is going to have jq installed, but everyone with Node has JSON.parse() available.
The knowledge that you build working with a scripting language is more transferable than that of a specific shell command
- Knowing how to use your terminal in general is important, but at the same time memorizing the exact flags to a specific command is going to be less useful than working on your general coding skills
Why not take advantage of a battle-tested standard library, instead of rolling your own?

More advanced example (click to expand)

Here is a more complex, but slightly contrived, example.

Let’s say that I want to read in a JSON file, find any key matching the pattern of user_image_\d+, and then extract out the ID from the key (the digits), and the filename and extension from the string (filename.ext). Sure, I could probably get this done using a combination of cat, grep or sed, and jq, but it is going to be convoluted (capture groups are a pain in both grep and sed) and involve me learning a bunch of syntax that is hyper-specific to jq and will not benefit me in other contexts. For comparison, I can accomplish this easily with NodeJS or Python (this example will be with Node though):

get_user_data() {
	JSON_FILE_PATH=$1 node << "EOF"
const fs = require('fs');
const rawData = JSON.parse(fs.readFileSync(process.env.JSON_FILE_PATH));
const extractedData = [];
Object.keys(rawData).forEach((key) => {
	const userId = /user_image_(\d+)$/.exec(key)?.[1];
	if (!userId) {
		return;
	}
	const {fileName, extension} = /^(?<fileName>.+)(?<extension>\.[^.]+)$/.exec(rawData[key])?.groups;
	extractedData.push({userId, fileName, extension});
});
fs.writeFileSync('extracted.json', JSON.stringify(extractedData));
EOF
}

Using heredocs as real files and virtual files

We’ve already covered piping or redirecting heredocs to commands that are expecting standard input (stdin) , but what about commands that only take files?

The route often taken here would be to pipe the heredoc to an actual temp file that gets written out, then pass it to the command, like so:

TEMP_PATH=$(tempfile)
cat > $TEMP_PATH << EOF
Hello!
EOF
# Get filesize
du -sb $TEMP_PATH
rm $TEMP_PATH

However, with my obsession with seeing how much I can inline into my Taskfiles / shell scripts, I also went to the trouble of figuring out how to do this without needing the intermediate temp file.

There are two main workarounds.

The first is process substitution. We can use <() to get something that acts like a file descriptor (or is one), and pass our heredoc into it:

diff <(cat << EOF
Line 1
EOF
) <(cat << EOF
Line 1
Line 2 - I'm new!
EOF
)

If you are concerned about the readability of the above snippet, you could separate out the heredocs as variables first - I think this makes it more legible, but also adds more lines:

version_a=$(cat << EOF
Line 1
EOF
)
version_b=$(cat << EOF
Line 1
Line 2 - I'm new!
EOF
)
diff <(echo "$version_a") <(echo "$version_b")

However, there is actually one final trick up our sleeve to pass a heredoc to a command that only accepts file paths: we can pipe the heredoc, and then pass stdin as a file descriptor!

cat << EOF | wc -l /dev/fd/0
print("hello from Python")
print("Line 2")
EOF

# This also works
echo | wc -l /dev/fd/0 << EOF
print("hello from Python")
print("Line 2")
EOF

I’m using /dev/fd/0 instead of /dev/stdin. Either should realistically work in most OSes, but /0 and /1 (for /dev/stdout) might be more portable.

This might fall under the “hey is Josh taking this too far?” or “maybe we should cut off his coffee intake” side of things, as this is taking the “how many things can we inline” to the extreme, perhaps beyond where it should. But hey, I get a kick out of this stuff!

Here Strings

So far I’ve exclusively been talking about heredocs / here documents, but it is worth mentioning that there are also Here Strings. They use a triple redirection operator (<<<), but as they can only operate on a single quoted string (aka word), they have more limited utility.

Wrap Up

I hope you found this post useful, or at least entertaining.

If you did, you might also like my my bash / shell scripting cheatsheet.

And on that note,

cat << EOF
✨ Happy coding!!! ✨
   Sincerely, $USER
   $(date +%F)
EOF

Shell Heredocs - for Fun and Profit, in Bash and Beyond