Building a BASH template engine /
Earlier this year I migrated my blog away from Jekyll because it was such a large setup for such a small blog to maintain. I wanted something more “portable” without libraries, so I settled to build my blog and it’s generation with BASH.
Everything needed was built out: frontmatter processing, Markdown to HTML generation, post handling, page handling, category support, RSS support, sitemap support, and more.
Originally, I wrote a very basic template handler which essentially just sniffed the first few
characters of a string to know an action to take, example: ${inc:file.html}
would include
a file into the template, ${var}
would replace itself with the applicable variable, and so
on. It began getting more complex where I required some basic if/else statements and looping, which was
something difficult with the basic template handler I originally wrote.
I decided to rebuild it from scratch, taking inspiration from Mustache syntax, and ensuring it was more than just a simple find and replace operation as it was before.
The plan was to do the following:
- Detect the applicable template tags
- For each template tag:
- Parse it character-by-character to determine:
- It’s operation type
- It’s optional ask
- It’s optional contents
- And more
- Parse it character-by-character to determine:
- Once determined, run the applicable operation to produce a result
- Take the result and replace the original template tag with that result
Initial supports I targeted:
-
Variables:
{{var}}
-
If statements:
{{#var}}I exist!{{/var}}
-
Unless statements:
{{^var}}I do not exist!{{/var}}
-
Partial includes:
{{>file.html}}
-
Looping:
{{%foreach var}}{{key}}: {{value}}{{/foreach var}}
-
Custom functions:
{{%money USD}}4500{{/money USD}}
Starting the parser:
# TOKEN_MATCH represents the REGEX for token detection.
# Format of `{{(#|^|>|%)(KEY)}}(CONTENT)({{/(KEY)}})`.
readonly TOKEN_MATCH='\{\{([#^>%@])([^}]+?)\}\}(?:(?R)|.)*?\{\{/\2\}\}|\{\{[\w#^>%@./,-]+\}\}'
# Operations.
readonly OP_UNLESS="unless"
readonly OP_IF="if"
readonly OP_INCLUDE="include"
readonly OP_VARIABLE="variable"
readonly OP_FUNCTION="function"
# Tokens.
readonly TOKEN_OPEN_BRACE="{"
readonly TOKEN_CLOSE_BRACE="}"
readonly TOKEN_CLOSE_SLASH="/"
readonly TOKEN_OP_UNLESS="^"
readonly TOKEN_OP_IF="#"
readonly TOKEN_OP_INCLUDE=">"
readonly TOKEN_OP_FUNCTION="%"
Here we have our REGEX expression (which can only be used with pcregrep
) to match the
template tags. Basic grep
(along with BASH), does not support expressions complex enough
to handle template tag detection, specifically detection of nested template tags. Additionally, we
define our key tokens to look for and the operation mapping to perform.
The REGEX was quite hard to come up with. I’m sure someone with more REGEX knowledge could make the expression more concise.
# ...
parse_tpl() {
local match=""
local input=""
input=$(cat -)
if [[ $input =~ ^templates/.* ]]; then
# Input is a template, get its contents.
input=$(cat "$input")
fi
# ...
Template data is piped into parse_tpl
, where on line 4, input=$(cat -)
we
capture this input. Next, we then check if the input
variable starts with
templates/
, if it does, we know that this is not template data, but a template path, for
which we then get that template’s contents if that is the case. This allows the parsing function to
work with both template data and template files at the same time.
Next, we need to find all template tags in the input
so we can loop them one-by-one to
process.
# ...
# Loop all matches.
while match=$(echo "$input" | pcregrep -M -o "$TOKEN_MATCH" | head -n 1); [ -n "$match" ]; do
local prev_char=""
local next_char=""
local index=0
local block_start_index=0
local in_expr=0
local is_close=0
local is_block=0
local block=""
local expr=""
local op=""
local parsed=""
# ...
This will take the input
variable, pipe it to pcregrep
with our REGEX
expression and grab the first match. It will keep doing this until nothing else can be found in
input
. Variables are then setup to track the state of the current match… items such as
what the previous character was so we can “look behind”, what the next character is so we can “look
ahead”, the block content, the expression, the operation type, the parsed output, and more.
Now that we have loop running for each match, we need to parse each character of the match one-by-one to determine its operation, the current index (column), expression, optional arguments, optional block content, and more.
# ...
while read -r -N1 char; do
next_char=${match:$index+1:1}
# ...
# Record this character, increase index.
prev_char="$char"
((index++))
done < <(echo -n "$match")
# ...
Here we are using read
to read the piped in value of match
one character at a
time. The next character is stored and the end of the loop, the previous character (current) is stored
while increasing the index (column) of where our current read position is in the match.
# ...
case $char in
# ...
esac
# ...
Utilizing a switch, each character will pass through this. We want to detect our tokens and start tracking state.
# ...
# Handle opening brace of expression.
"$TOKEN_OPEN_BRACE")
if [[ "$prev_char" = "$TOKEN_OPEN_BRACE" ]]; then
in_expr=1
if [[ "$next_char" = "$TOKEN_CLOSE_SLASH" ]]; then
# Next character is slash, this expression is a closing one.
is_close=1
fi
fi
;;
# ...
Our first case handle the open brace character ({
). We then check if the previous
character was also an open brace, if so, this tell us we have two open braces right now
({{
) which means were inside a template tag; given this, we set is_expr
to
“true” to let the rest of the code be aware we are inside a template tag. Last, we check if the next
character is a slash (/
), which will tell us were at the closing part of a template tag.
# ...
# Handle close brace of expression.
"$TOKEN_CLOSE_BRACE")
if [[ "$prev_char" = "$TOKEN_CLOSE_BRACE" ]]; then
in_expr=0
if [[ "$op" = "" ]]; then
# No operation found. Variable output.
op="$OP_VARIABLE"
elif [[ $is_block -eq 1 ]]; then
# We need to break out and handle seperately.
block_start_index=$((index+1))
is_block=0
break
fi
fi
;;
# ...
Second, we handle the closing brace (}
). If the previous character was also a closing
brace, we know that we are no longer in a template tag. Next, if no operation was previously detected
(#
, >
, %
, or ^
), then we can assume this is a
variable operation ({{var}}
). If is_block
was previously set, then we know
the rest of the template tag is the block content, so we can record the start position of the block,
then break out to handle the block seperately, ensuring the block contents will not get processed by
this loop.
# ...
# If operation.
"$TOKEN_OP_IF")
if [[ $in_expr -eq 1 ]]; then
expr=""
op="$OP_IF"
is_block=1
fi
;;
# ...
Third, we handle the ifs (#
). We check if we are currently inside a template tag, and if
so, we record the operation as OP_IF
and set the block flag is_block
to
“true” so that the rest of the code is aware this operation has block content, which the previous
closing brace case will help handle later.
# ...
# Unless operation.
"$TOKEN_OP_UNLESS")
if [[ $in_expr -eq 1 ]]; then
expr=""
op="$OP_UNLESS"
is_block=1
fi
;;
# ...
Fourth, we handle unless (^
). Exactly the same as the if handling except the operation is
set to OP_UNLESS
.
# ...
# Include of partial operation.
"$TOKEN_OP_INCLUDE")
if [[ $in_expr -eq 1 ]]; then
expr=""
op="$OP_INCLUDE"
fi
;;
# ...
Fifth, we handle the inclusion of templates (>
). The operation is set to
OP_INCLUDE
.
# ...
# Function operation.
"$TOKEN_OP_FUNCTION")
if [[ $in_expr -eq 1 ]]; then
expr=""
op="$OP_FUNCTION"
is_block=1
fi
;;
# ...
Sixth, we handle function expressions (%
). Exactly the same as the if/unless handling
except the operation is set to OP_FUNCTION
.
# ...
# Handle any other characer.
*)
if [[ $in_expr -eq 1 ]]; then
if [[ $is_close -eq 0 ]]; then
expr="$expr$char"
fi
fi
;;
# ...
Last, we handle all other characters which do not match any other token. If we are inside a template
tag and the template tag is not closed (meaning we are currently in the open expression of the template
tag), then we append the current character, char
, to expr
.
If we have the following: {{%foreach var}}{{key}}: {{value}}{{/foreach var}}
, then it
builds expr
to capture foreach var
for use later.
Now we have processing of a single match completed, we can now pass this off to another switch to
handle processing those operations previously detected utilizing the state we previously filled up,
such as expr
, op
, etc.
# ...
case "$op" in
# Handle include of partial.
"$OP_INCLUDE")
parsed=$(tpl_include "$expr")
input="${input/"$match"/"$parsed"}"
;;
# Handle if.
"$OP_IF")
block=$(echo "$match" | extract_block "$expr" "$block_start_index")
parsed=$(echo "$block" | tpl_if "$expr")
input="${input/"$match"/"$parsed"}"
;;
# Handle unless.
"$OP_UNLESS")
block=$(echo "$match" | extract_block "$expr" "$block_start_index")
parsed=$(echo "$block" | tpl_unless "$expr")
input="${input/"$match"/"$parsed"}"
;;
# Handle variables.
"$OP_VARIABLE")
parsed=$(tpl_variable "$expr")
input="${input/"$match"/"$parsed"}"
;;
# Handle functions.
"$OP_FUNCTION")
block=$(echo "$match" | extract_block "$expr" "$block_start_index")
result=$(echo "$block" | tpl_function "$expr")
input="${input/"$match"/"$result"}"
;;
# Unknown, skip.
*)
;;
esac
# ...
A case is setup to handle each operation. Each operation will pass data to an internal function for
final processing and if it contains block content, then this block content will be piped into that same
internal function. The internal function will run its processing on the template data and return a
result, this result is used to replace the entire template tag with the result by reassigning
input
, which completes the core of the template engine. The next time the match loop runs,
it will find the next template tag to process as this one no longer exists.
Finally, we echo out input
so the parsed template data can be saved somewhere. We can now
process templates with BASH!
tpl_money() {
local block
local amount
block=$(cat -)
amount=$((block))
echo "\$$((amount / 100)) $1" # $45.55 USD
}
<div class="container container--{{#is_post}}post{{#is_archived}}-archive{{/is_archived}}{{/is_post}}">
<p>This is a page!</p>{{/is_page}}
{{^is_post}}<ul>
<li data-index="{{key}}">{{value}}</li>{{/foreach category}}
{{%foreach category}}</ul>
{{>templates/meta.html}}
{^is_archived}{{%money USD}}4555{{/money USD}}{{/is_archived}}</div>
BASH is definately not the best solution for a template engine, however, you can view the entire solution on my blog’s repo and modify for your needs.