Plaintext is the most important format for human communication. It's incredibly resilient, can be written and consumed by almost any computer or computer program, from devices from decades ago to a smart-toaster.
Markup languages are structured ways of writing plaintext. Crucially, they're meant to be read and written by both humans and computers. They can be manipulated by computer programs, but even without any processing people can understand and edit them. This makes them perfect for communication between heterogenous people and systems. I can send someone a file written in a markup language regardless of what kind of system they're running or what programs they use.
At their best markup languages augment the human intellect (a la Douglas Englebart). Their structure makes it easier for people to think and communicate their thoughts.
The syntax should:
Currently I think the best format for this is S-Expressions, the format of Lisps. They're nested lists of parenthesis and are very simple to parse.
(This is a list ( containing this nested list) ( and this one))
The structure of the program should be easy to edit, whether that means changing the order of items, or "promoting" or "demoting" sections.
This gets slightly complicated if our language is not indentation aware. For example with markdown and org-mode nested sections are described with an increasing prefix length. So a header starting with "**" will be a child of the first header of "*" above it.
However if we're doing a system based on parenthesis this does not hold true.
We want to be able to easily communicate texts written in this langugae, easily sending them all over. We also want to be able to reference other files.
It should be possible to reference (and maybe even transclude) specific sections of other documents.
This is where things get interesting. A powerful feature would be to write code that generates text to be used in text files. This could be to produce things like reports, or different views on files. This makes each file written in this language analogous to a program that produces a file.
In lisps there is the "quote" operator which prevents a function from executing and returns it as a list. For example:
> (+ 4 7 ) > (11) > '( + 4 7) > (+ 4 7)
In our case though our language is operating with text as the default but code (i.e function calling) as the special case. We can just switch the meaning of the quote operator to mean, execute this block.
We need to parse files into an AST while still keeping track of their location in the original file. This is so that we can modify that original file with functions.
However, this is harder to do for what is ostensibly prose as opposed to code, as people may use whitespace and odd formatting intentionally.
What seems to be the standard for parsers is to have a token based system, with an interface for looking at the current token and future ones.