Friday, January 18, 2008

The SegmentLayer and UriLayer

Another layer idea I've been playing with is the UriLayer. The basic idea here is to give every Faceted part in the model a hierarchical path-based URI. This will allow for another way of modeling relationships (by embedding URIs in models, similar to how we did it with "spans" in the SpanLayer) and can serve as a basis for inter-model references (or external references to model parts).

Here again I'll assign bits of the Gem syntax to assist. If the first child in a Group or Root element is a Word that has a ~ suffix, then it will be treated as the segment of the URI path for the Group or Root element, otherwise the segment will just be the index number of the Faceted part.... Ok that sounds a bit complicated, so here's the text for a model:
foo~ (barX~ a b [baz~] ) (barY~ b)
and here's a look at the model:



So the idea is that foo~, barX~, etc. are Words with the ~ suffix and appear first in their Group, therefore they have been used to name the SegmentFacet of their Group. The SegmentFacets that don't get a name this way simply get a number instead.

The UriLayer itself simply gives us a view of the URI assigned to every Faceted part by walking up to the Model part and concatenating each SegmentFacet value on the way:



The images above are showing Figures but not Sites. Let's look at both for the SegmentLayer:



So all Sites have '/' as the SegmentFacet value except for the Model which has '!'. The idea here is that if the Environment is asked to resolve a URI it can split it around the '!' and look at the first part to find a model and then hand the second part to the model's UriLayer to resolve in the context of that model.

The vague plan...

I've thought about adding an abstract UriQueryLayer where its UriQueryFacet would have a method like handle(String uriQuery). This would allow for resolving a URI (with the UriLayer) and handing the parameters to the resolved part to do .... whatever (you would subclass UriQueryFacet to provide domain-specific implementations).

With this group of layers you could imagine using a model to power a servlet. Incoming URLs could be directed to a UriQueryFacet for processing. The behavior of this servlet can vary based on the structure of the model (used to encode state) and the implementation of the UriQueryFacet (used to encode behavior).

Wednesday, January 16, 2008

blog filter: Epsilon and Glimmer

Here are two blog posts pulled from Planet Eclipse this week that relate somewhat to my quest for modeling language nirvana (whatever that means).

First is about a very cool looking Eclipse GMT subproject called Epsilon. The post is about how Epsilon can transform from HUTN to EMF (HUTN is a spec that outlines a general purpose textual syntax for describing models). See the linked screencast demo. Also mentioned are xText and TCS - two textual modeling / DSL systems in the Eclipse modeling project orbit.

The second post is about Glimmer. (And it makes the bold, but somewhat dubious claim that 'XML is dead'. Don't we wish!). Glimmer looks like a simple declarative syntax for SWT screen layout. I thought this was cool because I've been generating graphviz's DOT syntax from Gem and it would be similarly easy to generate Glimmer.

Sunday, January 13, 2008

The SpanLayer

In this post I'm going to continue two threads from previous posts.

First, I mentioned previously that I think of modeling as object-orientation with first-class relationships between the objects and I talked about the containment relationship which is built into the lattice metamodel. In this post I want to talk about implementing another kind of relationship as a layer called the SpanLayer.

Second, I've described the Gem syntax and noted that it has no pre-assigned semantics. Here I'm going to reach into this toolbox of syntax and begin assigning meaning to it.

The idea of the SpanLayer is to build a hashtable that maps some of the facets to others. We will interpret the ` (back-tick) prefixing a Word as designating the target (or end-point) of a "span" and the ~ (tilde) prefixing a Word as designating the source (or start-point) of a span.

Let's look at an example:



The text for the above model looks like this:
`foo `bar [ a b ~foo (c ~bar ~foo) ]
The SpanLayer shown in the diagram is simply highlighting the sources and targets of the spans by blanking out the text for the facets that aren't related to spanning. This is ok for a start, but it would be better to show spans more like this:



Here the span targets are shown with a different shape and the spans themselves are drawn from source to target. This FancySpanLayer was built on top of the SpanLayer. It uses the raw span information provided by the SpanLayer and renders it in a more sensible way.

Saturday, January 12, 2008

Overview

The purpose of this blog is to document an experimental software modeling system I've been developing as a hobby project. The name of this system is Gem, which is an acronym for Graphical Executable Modeling.

This post serves as an outline and suggested reading order for the blog.

Gem and this blog are works in progress. I will continue to revise existing posts, add new posts and develop the code base as time permits. Feedback is greatly appreciated. You can send mail to cjdaly at the domain-name of this blog.

Introduction
Sketch of a language workbench?
Simple non-containment references
Demos!

more to come...

Thursday, January 10, 2008

The SerializeLayer and the ParseLayer

Let's look at a "Hello World" model and use it to talk about the two most important layers in Gem. The input text of the model below is simply:
Hello World!



The SerializeLayer has a buffer of the text of the model and each SerializeFacet maps between positions in the buffer and their associated parse nodes. The ParseLayer represents the tree produced from parsing. As you can see from the graph on the right, each ParseFacet knows what kind of node (Word, Operator, Root, etc...) was parsed at the position recorded in the associated SerializeFacet.

The SerializeLayer and ParseLayer have a somewhat incestuous relationship (there is a circular dependency between them) and they could be collapsed into one. For now I am keeping them separate because I think it's a little simpler to deal with and it makes for more interesting pictures.

Speaking of interesting pictures, here is a more elaborate model:



The input text of the above model is:
foo (bar {baz} 'abc ~{123}' xyz )
The difference between Figures and Sites isn't relevant for the SerializeLayer and ParseLayer so notice that the Sites have been omitted here to simplify the graphs.

Monday, January 7, 2008

The Lattice Metamodel

In the course of working on Gem I have identified and developed something I'm calling the Lattice Metamodel. The simplest way to think of it is as an array of trees where every tree has the same shape. I had started with a simple parse tree (as I often do) and was trying to pull semantic information out of the tree in small increments. Over time I could see several trees with data flowing from one to another. After more time I started to think of this as a lattice with many layers where you can navigate both the tree structure of an individual layer and between layers.

I think it's interesting to compare this idea with attribute grammars where information flows up and down in a single tree. There is also a similarity with neural networks where groups of neurons are often organized into layers. I'm not much of a theoretician. I'm more of a practitioner who fiddles around with stuff like this in the hopes that it will make code smaller, simpler and more readable. So for me the key challenge was to think of the lattice as a programming model and make it easy to write code that operates on (and in) the lattice.

Below is a screenshot of the class hierarchy of the lattice metamodel. In case you can't tell, these are Java classes viewed in Eclipse.



Everything is a Part (just as in UML everything is an "Element"). The primary tree of the lattice (that defines the tree shape for all the layers) is rooted at a Model part. The rest of this tree is composed of Figures and Sites. Each Figure may contain zero or more Sites and each Site contains zero or one Figure. Note that a Model is a kind of Site.

Models have a collection of Layers. Figures and Sites are "Faceted" which means they each have a collection of Facets. Each Facet is an intersection between a Layer and a Faceted part. The diagram below shows an example model with two layers.



(The black vertical and diagonal lines show the containment relationships. The blue horizontal lines are simply showing the "sibling" relationship created when two parts have the same parent.)

Imagine overlaying the three trees on top of each other and having a series of vertical connections that tie each Faceted part to the two Facets above it.

The only class not yet mentioned is Environment which is essentially just a collection of Models.

Extending the Lattice

Each of the Part classes has an associated Extension class, but I'm just going to discuss LayerExtension and FacetExtension here. The role of the Part classes is simply to describe the structure of the lattice. The Extension classes are for introducing interesting behavior into this structure.

If you want to implement a new Layer you subclass LayerExtension and register it with an Eclipse extension-point. There are a few methods you need to implement. One expresses the dependencies that your Layer has on other Layers. Another is sort of a factory method for FacetExtensions - given a Facet you must return a FacetExtension to be associated with the Layer. Beyond this you can add whatever code you want to the LayerExtension to implement the behavior you need.

You can use a single FacetExtension subclass for your layer or derive a more complex class hierarchy. There are currently no abstract methods in FacetExtension that you must implement. As before you can add whatever Java code you need to satisfy the purpose of the layer.

Once a FacetExtension is created for a Facet, that instance will remain on the Facet until the Facet is destroyed (which will happen when the base Figure or Site is destroyed). This behavior could perhaps be relaxed but it's low on my todo list.

Navigating the Lattice

Faceted, Facet and FacetExtension have methods to support easy navigation through the lattice. These methods are based on the terminology shown in the figure below:



So getSuperior() returns the Part or Extension immediately above (towards the root of the tree) and getSubordinate(int) returns the one below indicated by the index parameter. These methods have the (often annoying) characteristic that they flip-flop you from Figures to Sites and back. Often you want to orient yourself on either Figures or Sites and then navigate around. For this you can use getSup() and getSub(int). There is a getSize() method which returns the number of subordinates. Since getSize() will always return 1 (or 0) for a Site, there is also a getSize(boolean) which allows you to skip through Sites (when using getSub(int)). (There should be methods like getSubs() and getSubordinates() that return an unmodifiable list, I just haven't been bothered enough to implement them yet).

The FacetExtension variants of the methods take a Class parameter. This allows a sort of diagonal movement through the Lattice. Imagine you are are implementing FooFacet. You could write getSuperior(BarFacet.class) to get the FacetExtension above and on the BarLayer.

In addition to lateral (and diagonal) movement we need to navigate directly between layers and also back and forth between the Extensions and the Parts. There are methods for all of this. A typical thing is to go from one FacetExtension to another. Again imagining a FooFacet, we could write getFx(BarFacet.class) to get the FacetExtension of the BarLayer associated with the same Faceted part.

Similar to getFx(Class) there are getLx(Class) methods to return a specific LayerExtension. So you could write getLx(BarLayer.class) to get the BarLayer.

Figures and Sites

Why do we even need Sites? We could get rid of them and halve the size of this (fairly huge looking) data structure. Perhaps the best answer I have is that my intuition tells me to keep the Sites for now. Here's some of the thinking behind the intuition:

What is modeling? My one sentence definition of software modeling goes something like this:
Given a reasonable definition of "software object", software modeling is about formalizing (or making "first-class") the relationships between objects.
Another definition I like is:
A model is a collection of objects with managed relationships.
If you are an EMF programmer, you should at least partially agree. One of the great things about EMF is that it generates all the hairy code necessary to manage the references and keep everything well connected.

So going back to Figures and Sites, I think of Figures as the "elements" or the primary points of interest and Sites are the first-class representation of the relationship between Figures.

Of course Sites are just a representation of the containment relationship. There are many other kinds of relationships to model. So let me go even farther out on a limb: In the Lattice Metamodel, I'm treating the containment relationship as the fundamental relationship in the modeling system. Other kinds of relationships will be implemented in Layers (and discussed in other posts).

Another bit of (possibly faulty) intuition: I like to think of Sites as "neutral ground" between Figures, because I've run into a lot of tree programming situations where I have data that doesn't seem to belong either to the parent or child node - it needs to be shared by both on equal terms.

So for now the Sites stay.

Related Material

This post has some links to material that's related to Gem and the Lattice Metamodel in some way or another. I'm writing this both to credit the authors that have informed and inspired me and to give a sense of my "mental map" in this area. If you are familiar with many of these links then I'd guess you will find Gem interesting - at least as a curiosity. If you know of other material like this, please comment and help expand my mental map!

(Note: this needs a lot more work. If I were to dig through my notes, I could easily add dozens more links to content that informed my work here in some way or another - I just haven't had time.)

Software Modeling

Domain Specific Languages

Graph Drawing

Self Links - Other project I've worked on that are somewhat related.

Blogs - with an orientation toward programming language design topics:

Saturday, January 5, 2008

Rants and Raves

The original idea behind Gem was to try to design a programming language that could unify graphics (notation), execution (semantics) and modeling (structure) in the simplest way I can devise.

In the graphical realm I want to explore the idea of textual language syntax that suggests and flows into graphical representations. For example ( ) may suggest a circle or a radial layout and [ ] may suggest a box or a rectangular layout.

In the modeling realm I'm interested in the parallels between metamodels and grammatical languages (e.g. BNF) and how we can apply the insights and methods of the latter subject to the former. I'm also inspired by how EMF with a dozen or so primitive constructs (in Ecore.ecore) can tame the hugeness of UML. I'd like to search for other (meta)metamodels that are similarly simple, general and provide great leverage.

For execution all I can say is that I'll be happy when I have a model debugger and can step through a program model and watch it modify a data model (or itself).

There are some other ideas that feed into this project. They mostly amount to things that aggravate me with existing tools which I'd like to try to improve:
  • Modeling tools tend to have a central diagram space and a separate area for viewing and editing "properties". Getting anything done requires a lot of mousing back-and-forth between these two areas. I'd like to try to get rid of the properties view entirely and do all editing in the diagram space.
  • I think the keyboard has been greatly underutilized in modeling tools. I should be able to type in model fragments as text and have them parsed and redisplayed in whatever diagrammatic notation I'm currently using to view the model.
  • A lot of modeling systems use code generation as a mechanism for tailoring their behavior. This can be good, but I'd like to resist it as long as possible. All other things being equal, I'd prefer to use host language features (such as interface implementation) to allow developers to tailor the behavior.
  • The layout algorithms in graphical modeling tools rarely do what I want. And I can spend a lot of time arranging things "just right" and then be forced to start over when I need to add or remove elements from the diagram. I think there's an assumption that modelers want "freeform drawing". Actually what I want almost always is good automatic layout. I think the layout mechanism should work like a source code formatter: tweak some options (or model attributes) and then invoke the "reformat" action.
Another way I think about Gem (not the current implementation, but the Gem I'd like to get to) is as a visual/graphical programming language "done right". Visual programming languages go all the way back to Logo (maybe farther) and I've played with several of them but I've never seen one that made me feel as productive as the textual programming languages I use. I think if you polled a bunch of programmers and asked them, "will you ever prefer a graphical language over a purely text-based one?" very few would say 'yes' and many would say that it just isn't possible to have a graphical language that enhances productivity. But if you asked these same people 10 years ago, "Will you ever feel more productive in an IDE than with your toolbox of emacs/vi, compiler, debugger, formatter, ... ?" many of them would have said 'impossible' then but are very happily using Eclipse, Visual Studio, etc. today.

So things do get better, and the impossible becomes the norm. There has been a renaissance in programming language design over the last several years - which I think is great. But the vast majority are textual languages. Let's explore the second dimension. Now is not too early to be thinking about a viable graphical programming language. I don't know if Gem will be that language (it certainly isn't close yet), but that's what I think it wants to be. And even if it fails to become a viable language, it will be a success if it sheds new light on any of the ideas and aggravations discussed above.

The Gem proto-language syntax



Gem models use a very simple textual concrete syntax which is described below. I call this a proto-language because there are no semantics tied directly to this syntax. Instead the syntax is meant to be interpreted by processing layers applied to the model. XML has a similar proto-language characteristic. You can write XML and use a tool to check that it is well-formed, but the meaning of the XML depends entirely on how you choose to interpret it.

Gem Syntax Grammer
  • Root := Expr*
  • Expr := Prefix? (Token | Group | Quote) Suffix?
    • Prefix := A valid prefix Symbol with no space between it and the following Expr.
    • Suffix := A valid suffix Symbol with no space between it and the preceding Expr.
  • Token := Word | Numeric | Operator
    • Word := A valid Java identifier.
    • Numeric := [0 .. 9]+
    • Operator := Symbol
  • Group := AngleGroup | CurlyGroup | ParenGroup | SquareGroup
    • AngleGroup := < Expr* >
    • CurlyGroup := { Expr* }
    • ParenGroup := ( Expr* )
    • SquareGroup := [ Expr* ]
  • Quote := DoubleQuote | SingleQuote
    • DoubleQuote := " Fragment* "
    • SingleQuote := ' Fragment* '
  • Fragment := Text | Escape
  • Text := A sequence of characters not containing an escape Symbol or quote terminator (" or ').
  • Escape := CharEscape | ExprEscape
    • CharEscape := A valid character escape Symbol followed by a character escape sequence.
    • ExprEscape := A valid expression escape Symbol followed by a single Expr.
  • Symbol := A member of the set of characters on your keyboard which are not letters or digits.
Notes:
  • It's hard to get the look of the grammar correct in Blogger ... sorry about that.
  • The characters *, ?, | and + have their usual grammatical meanings:
    • * : a list of zero or more of the preceding rule
    • ? : an optional single instance of the preceding rule
    • | : used between alternatives
    • + : a list of 1 or more of the preceding rule
  • Characters in large bold Courier like [ and ] are literals.
  • The valid characters for Prefix, Suffix and Escape are not fixed here. They can be assigned when initializing the parser. I have been using ` (back-tick) and ~ (tilde) as prefixes and suffixes. Currently ` is the CharEscape initiator (but it really should be \ to fit the C/Java style) and ~ is the ExprEscape initiator. I'm considering adding ^ as a Prefix/Suffix. All of these could change.

Examples
  • Hello World!
    • two Words followed by an Operator
  • abc [123 {xyz} ]
    • use of Groups and Numerics (and Words)
For the following assume that ~ is a valid Prefix and ^ is a valid ExprEscape.
  • ~foo + ~ bar
    • the first ~ is a Prefix, the second is an Operator (because of the spacing)
  • "Hello ^{foo bar} World"
    • DoubleQuote with embedded CurlyGroup

Possible Enhancements
  • Add Infix and Outfix which would work like Prefix and Suffix but inside the Group delimiters like this:
    • AngleGroup := < Infix? Expr* Outfix? >
  • Some way of indicating that certain groups don't apply within a scope (e.g. so <> can be used as Operators).
  • The Numeric rule is currently just a string of digits. In the future it could be enhanced to allow more of the Java numeric literal syntax.