I recently changed my comment syntax to allow multi-line comments through indentation, which you can see an example of in the below code. If a line comment is followed by indented lines, then the comment continues until the indentation level returns to the initial level.
This approach supports nesting, doesn’t require an explicit terminal symbol, and is also useful for string literal parsing.
There were two hard parts to implement it:
- I previously translated leading tabs into indent/outdent tokens after the lexer turned the input string into tokens. Since comments and string literals are parsed by the lexer, I had to move that translation to happen before lexing. However, I cannot strip white-space until after lexing, so these new indentation-aware components of the lexer needed to explicitly handle superfluous white-space. The indentation-aware components of the parser don’t need to do so, since it occurs after white-space is stripped.
- It was tricky to make syntax highlighting in Sublime Text handle the new syntax. The problem is that it just uses a set of regexes to classify subsequences of the buffer. However, it does support nested regexes, so I ended up writing code to generate the syntax highlighting definition file, and use that to nest the indentation-dependent classifications within an explicitly enumerated regex for each indentation level.