
Gem models use a very simple textual concrete syntax which is described below. I call this a proto-language because there are no semantics tied directly to this syntax. Instead the syntax is meant to be interpreted by processing layers applied to the model. XML has a similar proto-language characteristic. You can write XML and use a tool to check that it is well-formed, but the meaning of the XML depends entirely on how you choose to interpret it.
Gem Syntax Grammer
- Root := Expr*
- Expr := Prefix? (Token | Group | Quote) Suffix?
- Prefix := A valid prefix Symbol with no space between it and the following Expr.
- Suffix := A valid suffix Symbol with no space between it and the preceding Expr.
- Token := Word | Numeric | Operator
- Word := A valid Java identifier.
- Numeric := [0 .. 9]+
- Operator := Symbol
- Group := AngleGroup | CurlyGroup | ParenGroup | SquareGroup
- AngleGroup := < Expr* >
- CurlyGroup := { Expr* }
- ParenGroup := ( Expr* )
- SquareGroup := [ Expr* ]
- Quote := DoubleQuote | SingleQuote
- DoubleQuote := " Fragment* "
- SingleQuote := ' Fragment* '
- Fragment := Text | Escape
- Text := A sequence of characters not containing an escape Symbol or quote terminator (" or ').
- Escape := CharEscape | ExprEscape
- CharEscape := A valid character escape Symbol followed by a character escape sequence.
- ExprEscape := A valid expression escape Symbol followed by a single Expr.
- Symbol := A member of the set of characters on your keyboard which are not letters or digits.
- It's hard to get the look of the grammar correct in Blogger ... sorry about that.
- The characters *, ?, | and + have their usual grammatical meanings:
- * : a list of zero or more of the preceding rule
- ? : an optional single instance of the preceding rule
- | : used between alternatives
- + : a list of 1 or more of the preceding rule
- Characters in large bold Courier like [ and ] are literals.
- The valid characters for Prefix, Suffix and Escape are not fixed here. They can be assigned when initializing the parser. I have been using ` (back-tick) and ~ (tilde) as prefixes and suffixes. Currently ` is the CharEscape initiator (but it really should be \ to fit the C/Java style) and ~ is the ExprEscape initiator. I'm considering adding ^ as a Prefix/Suffix. All of these could change.
Examples
- Hello World!
- two Words followed by an Operator
- abc [123 {xyz} ]
- use of Groups and Numerics (and Words)
- ~foo + ~ bar
- the first ~ is a Prefix, the second is an Operator (because of the spacing)
- "Hello ^{foo bar} World"
- DoubleQuote with embedded CurlyGroup
Possible Enhancements
- Add Infix and Outfix which would work like Prefix and Suffix but inside the Group delimiters like this:
- AngleGroup := < Infix? Expr* Outfix? >
- Some way of indicating that certain groups don't apply within a scope (e.g. so <> can be used as Operators).
- The Numeric rule is currently just a string of digits. In the future it could be enhanced to allow more of the Java numeric literal syntax.
0 comments:
Post a Comment