CommonNorg Spec (Flat)

Flat Document

PEG expression
flat_document = flat_block*

flat_block = indent? base_block

indent = indent_prefix whitespace (attributes whitespace)?

indent_prefix = unordered_prefix
              / ordered_prefix
              / quote_prefix
              / null_prefix

A Flat Document is a list of Flat Blocks.

A Flat Block is made up of optional Indent and Base Block.

Indenting Rule

In CommonNorg, every Base Blocks can be indented.

Instead of using arbrary amount of whitespace characters, we use repeated indentation prefix characters to indent. The indentation can have an indent type and an indent level, and these are determined by the kind and repeated count of the prefix characters.

There are four types of indentation as follows:

  • unordered

  • ordered

  • quote

  • null

The meaning and necessity of each type of indentation are discussed in the Structured Document section.

Base Block

PEG expression
base_block = blank_heading
           / heading
           / tag
           / horizontal_rule
           / blank_line
           / paragraph

Parsing precedence follows this order, but I'll explain from most common ones below.

A base block is one of:

  • paragraph

  • blank line

  • horizontal rule

  • heading

  • blank heading

  • tag

These are all atomic block type nodes in CommonNorg.

Paragraph / Paragraph Segment

PEG expression
paragraph = paragraph_segment+

paragraph_segment = (!non_paragraph_flat_block .+) eol

non_paragraph_flat_block = indent? non_paragraph_base_block
non_paragraph_base_block = blank_heading
                         / heading
                         / tag
                         / horizontal_rule
                         / blank_line

Paragraph Segment refers to any line that does not match a flat block (excluding unindented paragraphs). A paragraph consists of one or more paragraph segments.

NOTE

We are skipping inline parsing here because inline markups will make the paragraph definition extremely verbose.

Blank Line

PEG expression
blank_line = eol

Yes, a blank line itself is a block.
This will make sense when we introduce more grammars further.

Because a blank line can be a flat block, it can be used to split the sequence of paragraphs. You can think this as renamed version of paragraph break from norg spec v1

Horizontal Rule

PEG expression
horizontal_rule = "__" "_"* eol

Horizontal Rule hasn't changed a lot from v1 spec.

It is a "

Heading / Blank Heading

PEG expression
heading       = "*"+ space (attributes space)? heading_title
blank_heading = "*"+ eol

heading_title = paragraph_segment

A heading shares pretty similar syntax to v1 spec.

  • It starts with repeated * prefix characters

  • It can have optional attributes set

  • It has a paragraph segment as a title

The level of heading and blank heading is determined by the amount of * prefix character. (NOTE: this is not part of the indenting rule I explain above)

You can think blank heading as heading without title.

You might wonder why the hell we need blank heading in a spec, this will make sense when we introduce section in structured document.

difference from norg spec v1

Unlike norg spec v1, in flat document, a heading does not have any children blocks. The heading from v1 will be reintroduced as section in structured document later.

Tags

Tags are extendable component to represent block elements that cannot be expressed with CommonNorg synax. The way how tag work will be explained in Tag System.

There are 3 types of tags in CommonNorg.

  • infirm tag

  • ranged tag

  • carryover tag

Infirm Tag

PEG expression
infirm_tag = "." identifier (space tag_inline_args)? eol

Infirm tag syntax hasn't really changed from v1. One key difference is that now we use ; character to separate multiple arguments:

.image /path/to/image.png; This is an alt text

Ranged Tag

PEG expression
ranged_tag = "@"+ identifier (space tag_inline_args)? eol
             (!ranged_tag_end (!eol .)* eol)*
             ranged_tag_end

ranged_tag_end = "@"+ "end" eol?

A ranged tag is a block tag that can have multi-line string as an additional argument alongside with inline arguments like infirm tag.

Escaping with repeated prefix characters

Similar to Github Flavored Markdown, ranged tag's prefix character @ can be repeated as much as you want enabling nesting possible end modifiers inside:

This is how you write code block in norg spec v1:

@@example
@code python
print("hello world!")
@end
@@end

This is why CommonNorg doesn't have Standard Ranged Tags. It doesn't need it.

Repeating prefix characters might feel dumb, but it works for every single cases.

Whitespace indent trimming rule

Ranged tag content's whitespace indents can be trimmed at same column where the ranged tag started. Column here is cacluated by bytes and we don't care about cases like tab (U+0009) or zero width characters and only trim whitespace indents by bytes.

If there aren't enough whitespace indents, that will be a syntax error. But there are parsers like tree-sitter which cannot create custom rich errors. In that case, they can parse without trimming and leave that job to Neorg-like applications.

Carryover Tag

A carryover tag in flat AST basically shares same syntax from infirm tag with different prefix character (#). Like v1, a carryover tag is used to consume next element and generate different results. But unlike v1, in context of flat document, a carryover tag doesn't include the next element. We can only define the next element when we start parsing structured document.

References