Wednesday, June 18, 2014

Optionally Typed Parameters

Optionally Typed Parameters

And now, the latest entry in the series about the new Puppet Type System introduces the capability to optionally type the parameters of defines, classes, lambdas and EPP template parameters.

There are also some new abilities and changes to the type system that I will cover in this post.

Optionally Typed Parameters

Writing high quality puppet code involves judicious use of type checking of given arguments. This is especially important when writing modules that are consumed by others. Anyone having written a serious module knows that it is a chore to not only deal with all of the parameters in the first place, but that type checking involves one extra call to a standard lib function per parameter. The result is simply lower signal to noise ratio.

From Puppet 3.7 when using the future parser/evaluator (and from 4.0 when the future parser/evaluator becomes the standard), you can now (optionally) type the parameters of defines, classes, EPP template parameters and lambdas. We have even thrown in support for varargs/captures-rest/splatted parameter in lambdas (excess arguments are delivered in an array).

Type Checking Defines and Classes

To opt in to type checking in defines and classes, simply give the type before the parameter declaration:

String $x        # $x must be a String
String[1] $x     # $x must be a String with at least one character
Array[String] $x # $x must be an Array, and all entries must be Strings

(See earlier posts in this series for other types, and the type system in general).

If you do not type a parameter, it defaults to the type Any (renamed in 3.7.0 from Object). And this type accepts any argument including undef.

define sinatra(String $regrets, Integer $amount) {
  notice "$regrets, I had $amount, I did it my way. Do bi do bi doo..."
sinatra{ frank:
  regrets => regrets,
  amount  => 2        # e.g. 'a few'

Which results in:

Notice: regrets, I had 2, I did it my way. Do bi do bi doo...

And if the wrong type is given:

sinatra{ frank:
  regrets => regrets,
  amount  => 'a few'

The result is:

Error: Expected parameter 'amount' of 'Sinatra[frank]' to have type Integer, got String ...

And while on this topic, here are a couple of details:

  • If you supply a default value, it is also type checked
  • The type expressions can use global variables - e.g. String[$minlength]

Type Checking Lambdas

Lambdas can now also have type checked parameters, and lambdas support the notion of captures-rest (a.k.a varargs, or splat) by preceding the last parameter with a *. The type checking of lambdas, and the capabilities of passing arguments to lambdas has been harmonized with the new function API (which I will presenting in a separate blog post).

Before showing how typed lambda parameters works, I want to tell you about a new function called with that I will use to illustrate the new type checking capabilities.

The 'with' function

Andrew Parker (@zaphod42) wrote a nifty little function called with that is very useful for illustrating (and testing) type checking. It is also very useful in classes, where you would like to make some logic (and in particular some variables) local/private to a block of code, this to avoid leaking non-API variables from your classes.

The with is very simple - it just passes any given arguments to the given lambda. Hence its name; you can think of it as "with these variables, do this...".

with(1) | Integer $x | { notice $x }

Which, calls the lambda with the argument 1, assigns it to $x after having checked for type compliance, and then notices it.

Now if you try this:

with(true) | Integer $x | { notice $x }

You get the error message:

Error while evaluating a Function Call, lambda called with mis-matched arguments
  lambda(Integer x) - arg count {1}
  lambda(Boolean) - arg count {1} at line 1:1 on node ...


As mentioned earlier, you can declare the last parameter with a preceding * to make it capture any excess arguments given in the call. The type that is given is the type of the elements of an Array that is constructed and passed to the lambda's body.

 with(1,2,3) | Integer *$x | { notice $x }

Which results in:

Notice: [1, 2, 3]

There is one special rule for captures rest: If the type is an Array type, it is used as the type of the resulting array. Thus, if you want to accept elements of Array type, you must describe this as an Array of Arrays (or use the Tuple type). By declaring an Array you can constrain the number of excess arguments that the captures-rest parameter accepts.

 with(1,2,3,4,5) | Array[Integer,0,3] *$x | { notice $x }

Will fail with the message:

 Error while evaluating a Function Call, lambda called with mis-matched arguments
   lambda(Integer x{0,3}) - arg count {0,3}
   lambda(Integer, Integer, Integer, Integer, Integer) - arg count {5} at line 1:6 on node ...

A couple of details:

  • The captures-rest does not affect how arguments are given (in the example above, the lambda could have been changed to have 3 individual parameters with a default value and still be called the same way, and it would accept the same given arguments).
  • Captures rest is not supported for classes, defines, or for EPP parameters

Using the assert_type Function

And finally, if the built in type checking capabilities and the generic error messages that they produce does not work for you there is an assert_type function that gives you a lot more flexibility.

In its basic form, it performs the same type checking as for typed parameters. The assert_type function returns its second argument (the value) which means it can be used to type check and assign to a resource attribute at the same time:

 # Somewhere, there is this untyped definition (that does not work unless $x is
 # an Integer).
 define my_type($x) { ... }

 # And you want to create an instance of it
 my_type { 'it':
   x => assert_type(Integer, hello)

Which results in:

 Error: assert_type(): Expected type Integer does not match actual: String ...

The flexibility comes in the form of giving a lambda that is called if the assertion would fail (the lambda "takes over"). This can be used to customize the error message, to issue a warning, and possibly return a default sanitized value. Since the lambda takes over, you need to call fail to halt the execution (if that is what you want). The lambda is given two arguments; the expected type, and the actual type (inferred from the second argument given to assert_type).

 assert_type(Integer, hello) |$expected, $actual| {
   fail "The value was a $expected must be an Integer (like 1 or 2 or...)"

Which results in:

 Error: The value was a String must be an Integer (like 1 or 2 or...) 

Type checking EPP

Type checking EPP works the same way as elsewhere, the type is simply stated before the parameter and defaults to Any. EPP parameters does not support captures-rest.

See the "Templating with Embedded Puppet Programming Language" for more information about EPP.

In this post

In this post I have showed how the new optionally typed parameters feature in Puppet 3.7.0's future parser/evaluator works and how type checking can be simplified in your Puppet logic.

The Type System Series of Blog Posts

You can find the rest of the blog posts about the type system here.

Tuesday, May 6, 2014

Puppet Internals - The Integer Type Ruby API

Creating an Integer

Here is a follow up post about the Puppet Type System Ruby API. The Integer type (as you may recall from the earlier posts) have the ability to represent a range of values and the earlier posts showed how this can be used in the Puppet Language. In this post, I will show you how the Integer range features can be used from Ruby.

Creating an Integer

An Integer type with a range can be created in Ruby a couple of different ways.

# Using the type factory
FACTORY = Puppet::Pops::Types::TypeFactory
range_t = FACTORY.range(100, 200)

# Using the type parser
range_t = TYPE_PARSER.parse('Integer[100,200]')

If you want to be explicit about an Infinite (open) range, use the symbol :default in Ruby, and the default Puppet language keyword in in the string representation given to the type parser.

The integer type's class is Puppet::Pops::Types::PIntegerType.

Integer Type API

The Integer type has two attributes, from and to. In Ruby these values are either nil or have a Ruby Integer value. A value of nil means negative infinity in from, and positive infinity in to.

The Integer may also have a to value that is <= from (an inverted range).

The most convenient way to get the range in Numeric form is to call the method range which returns an array with the two values with smallest value first, and where nil values are replaced by the corresponding +/- Infinity value.

A Note About Infinity

Infinity is a special numeric value in Ruby. You can not access it symbolically, but it is the value that is produced by an operations such as 1/0. The great thing about this value is that it can be used in arithmetic, and naturally; the result of any arithmetic operation involving Infinity is still Infinity. This makes it easy to test if something is in range without having to treat the unbound ends a special way.

The constants INFINITY, and NEGATIVE_INFINITY are available in Puppet::Pops::Types should you need them for comparisons.

Range Size

You can get the size of the range by calling size. If one of the to/from attributes is Infinity, the size is Infinity.

Iteration Support

The PIntegerType implements Ruby Enumerable, which enable you to directly iterate over its range. You can naturally use any of the iterative methods supported by Enumerable.

If one of the to/from attributes is Infinity, nothing is yielded (this to prevent you from iterating until the end of time).

range_t = FACTORY.range(1,3)
range_t.reduce {|memo, x| memo + x }  # => 6

Getting the String Representation

All types in the Puppet Type system can represent themselves in String form in a way that allows them to be parsed back again by the type parser. Simply call to_s to get the String representation.

Using Integer Range in Resources

Resources in the 3x Puppet Catalog can not directly handle PIntegerType instances. Thus, if you like to use ranges in a resource (type), you must use the string representation as the values stored in a resource, and then use the type parser to parse and interpret them as Integer values.

You can use the type system without also using the future parser for general parsing and evaluation. The only requirement is that the RGen gem is installed. And if you are going to use this in a Resource, you must also have RGen installed on the agent side. (In Puppet 4.0 the RGen gem is required everywhere).

Monday, May 5, 2014

Puppet Internals - Modeling and Custom Logic

Puppet Internals - Modeling and Custom Logic

In this post about Modeling with Ecore and RGen I will show how do achieve various common implementation tasks. Don't worry if you glanced over the the very technical previous post about the ECore model, If needed, I will try to repeat some of the information, but you may want to go back to it for details.

Derived Attributes

Sometimes we need to use derived (computed) attributes. Say we want to model a Person and record the birth-date, but we also like to be able to ask how old a Person is right now. To do this we would have two attributes birth_date, and a derived attribute age.

class Person < MyModelElement
  has_attr 'birth_date', Integer
  has_attr 'age', Integer, :derived => true

(Here I completely skip all aspects of handling date/time formats, time zones etc., and simply use a date/time converted to an Integer).

Since a derived attribute needs to be computed, and thus requires us to implement a method, we must define this method somewhere. All logic for a modeled class should be defined in a module called ClassModule nested inside the class. The definition in this module will be mixed into the runtime class.

A derived attribute is implemented by defining a method with the same name as the attribute plus the suffix '_derived'.

The full definition of the Person class then looks like this:

class Person < MyModelElement
  has_attr 'birth_date', Integer
  has_attr 'age', Integer, :derived => true

  module ClassModule
    def age_derived - birth_date.year

Derived attributes are good for a handful of intrinsic things like this (information that is very closely related / an integral part of the class), but it should not be overused as we in general want our models to be as anemic as possible; operations on models are best implemented outside of the model as functions, the model should really just contain an implementation that maintains its integrity and provides intrinsic information about the objects.

Here is an another example from the new Puppet Type System:

class PRegexpType < PScalarType
  has_attr 'pattern', String, :lowerBound => 1
  has_attr 'regexp', Object, :derived => true

  module ClassModule
    def regexp_derived
      @_regexp = unless @_regexp && @_regexp.source == pattern

Here, we want to be able to get the real Ruby Regexp instance (the regexp attribute) from the PRegexpType based on the pattern that is stored in string form (pattern). Derived attributes are by default also virtual (not serialized), volatile (they have no storage in memory), and not changeable (there is no setter).

Here is an example of using the PRegexpType.

rt = => '[a-z]+')
the_regexp = rt.regexp
the_regexp.is_a?(Regexp)      # => true

Going back to the implementation. Remember, that all features (attributes and references) that are marked as being derived, must have a defined method named after the feature and with the suffix _derived. Thus, in this example, since the attribute is called 'regexp', we implement the method 'regexp_derived'. Since we do not have any storage and no generated supporting methods to read/write the Regexp we need to create this storage ourself. (Note that we do not want to recompile the Regexp on each request unless the pattern has changed). Thus, we assign the result to the instance variable @_regexp. The leading _ has no special technical semantics, but it is there to say 'hands off, this is private stuff'.

Adding Arbitrary Methods

You can naturally add arbitrary methods to the ClassModule, they do not have to be derived features. This does however go against the anemic principle. It also means that the method is not reflected in the model. Such methods are sometimes useful as private implementation method that are called from methods that represent derived features, or that are for purely technical Ruby runtime reasons (as you will see in the next example).

Using Modeled Objects as Hash Keys

In order for something to be useful as a hash key, it needs to have a hash value that reflects the significant parts of the object "as a key". Regular Ruby objects use a default that is typically not what we want.

Again, here is the PRegexpType, now also with support for being a hash key.

class PRegexpType < PScalarType
  has_attr 'pattern', String, :lowerBound => 1
  has_attr 'regexp', Object, :derived => true

  module ClassModule
    def regexp_derived
      @_regexp = unless @_regexp && @_regexp.source == pattern

    def hash
      [self.class, pattern].hash

    def ==(o)
      self.class == o.class && pattern == o.pattern

This implementation allows us to match PRegexpType instances if they are a) of the same class, and b) have the same source pattern. To support this, we simply create a hash based on the class and pattern in an Array. We also need to implement == since it is required that two objects that have the same hash also compute true on ==.

Can you think of improvements to this implementation?

(We do compute the hash value on every request, we could cache it in an instance variable. We must then however ensure that if pattern is changed, that we do not use a stale hash. In order to to know we must measure if it is faster to recompute the hash, than compute if the pattern has changed - this is an exercise I have yet to do).

Overriding Setters

Another use case is to handle setting of multiple values from a single given value - and worst case setting them cross-wise. (Eg. in the example with the Person, imagine wanting to set either the birth_date or computing from a given age in years - yeah it would be a dumb thing to do, but I had to come up with a simple example).

Here is an example from the AST model - again dealing with regular expressions, but now in the form of an instruction to create one.

# A Regular Expression Literal.
class LiteralRegularExpression < LiteralValue
  has_attr 'value', Object, :lowerBound => 1, :transient => true
  has_attr 'pattern', String, :lowerBound => 1

  module ClassModule
    # Go through the gymnastics of making either value or pattern settable
    # with synchronization to the other form. A derived value cannot be serialized
    # and we want to serialize the pattern. When recreating the object we need to
    # recreate it from the pattern string.
    # The below sets both values if one is changed.
    def value= regexp
      setValue regexp
      setPattern regexp.to_s

    def pattern= regexp_string
      setPattern regexp_string

Here you can see that we override the regular setters value=, and pattern=, and that these methods in turn use the internal methods setValue, and setPattern. This implementation is however not ideal, since the setValue and setPattern methods are also exposed, and if they are called the attributes value and pattern will get out of sync!

We can improve this by doing a renaming trick. We want the original setters to be callable, but only from methods inside the class since we want the automatic type checking performed by the generated setters.

module ClassModule
  alias :setPattern :_setPattern_private
  private :_setPattern_private

  alias :setValue :_setValue_private
  private :_setValue_private

  def setPattern(regexp_string)

  def setValue(regexp)

Here we squirrel away the original implementations by renaming them, and making them private. Since we did this, we do not have to implement the value= and pattern= methods since they default to calling the set methods we just introduced.

Now we have a safe version of the LiteralRegularExpression.

Defining Relationships Out of Band

Bi-directional references are sometimes tricky to define when there are multiple relations. The classes we are referencing must be known by Ruby and sometimes the model is not a a hierarchy. And even if it is, it is more natural to define it top down than bottom up order.

To handle this, we need to specify the relationships out of band. This is very easy in Ruby since classes can be reopened, and it especially easy with RGen since the builder methods are available for modifying the structure that is built while we are building it.

Here is an example (from RGen documentation):

class Person < RGen::MetamodelBuilder::MMBase
  has_attr 'name', String
  has_attr 'age', Integer

class House < RGen::MetamodelBuilder::MMBase
  has_attr 'address', String

Person.many_to_many 'homes', House, 'inhabitants'

What RGen does is to simply build the runtime model, for some constructs with intermediate meta-data recording our desire what our model should look like. The runtime classes and intermediate meta-data is then mutated until we have completed the definition of the model. The runtime classes are ready to use as soon as they are defined, but caution should be taken to use the classes for anything while the module they are in is being defined (classes may be unfinished until the very end of the module's body). Then, the first request to get the meta-model (e.g. calling Person.class.ecore) will trigger the building of the actual meta-model as an ECore model). It is computed on demand, since if it is not needed by the logic (only the concrete implementation of it), there is little point taking cycles to construct it, or having it occupy memory.

As you may have guessed, it is a terribly bad idea to modify the meta-model after it has been defined and there are live objects around. (There is nothing stopping you though if you know what you are doing). If you really need to jump through hoops like these, you need to come up with a scheme that safely creates new modules and classes in different "contexts".

In this Post

In this post I have shown some common tasks when using RGen. You should now have a grip on how derived attributes are handled and how to provide implementation logic for the declaratively modeled classes.

In a future post I will cover additional topics, such as dealing with custom data types, serialization of models, and how to work with fragmented models. It may take a while before I post on those topics as I have a bit of exploratory work to do regarding how these features work in RGen. meanwhile, if you are curious, you can read about these topics in the EMF book mentioned in the ECore blog post.

Thursday, May 1, 2014

Puppet Internals - The Ecore Model

The ECore Model

In my three previous posts about Modeling you can read about what a model is and the technologies used to implement such models, as well as learn about basic modeling with RGen - the implementation of Ecore for Ruby.

In this post I am going to take you on a tour of the Ecore model itself. In order for you to be able to have an idea of where we are when we stop at the interesting sites I am going to give you the complete map of Ecore up front. However, just like any guided city tour, I am not going to stop and show you every back alley.

A slight feeling of dizziness is excepted since we are at high meta-meta altitude, but it should be alright as long as you look where you step and go back to concrete ground when you are lost.

This post is the most theoretical in the series, and you probably want to come back to it for reference. I am sorry there are not that many examples, I promise to come back in more posts with concrete examples where all the meta-magic gets put to good use.

If you want a model to try thing out on, you can use this Car RGen model.

ECore Model

So here is the ECore model in full glory, thanks to Ed Merks (creator of Ecore) that produced this very compact while still readable diagram.

I post this map here at the beginning so you can jump back to it for reference - if you immediately want to find the sites of interest on the map, look for EClass, EReference, and EAttribute since they should be somewhat familiar from previous posts.

I am going to explain this first from a structural point of view (top-down), and then dive into the interesting details of the various elements.

Ecore Structure

All definitions in ECore live inside an EPackage which corresponds to a Ruby Module, or a Java Package (or similar concept in other technologies). If you have located the EPackage box in the diagram, you see two attributes; nsURI, and nsPrefix which are used to uniquely identify the model (the "schema identifier" if you like), and a name prefix that is typically used when generating code. You do not have to set these when you just want to model classes and use them at runtime, but they are valuable in other scenarios.

EPackage can be a sub package inside another package (this is rarely used).

An EPackage contains EClassifier elements and those come in concrete forms of; EClass, and EDataType. You have already seen a glimpse of EClass and EDataType in the previous posts; EClass is used for the classes in our model, and EDataType is used for data types that we can use in the attributes of classes (e.g. String, Integer).

An EClass contains EStructuralFeature elements and those come in concrete form of; EAttribute, and EReference. Again, in the previous posts you saw examples of both attributes and references.

ECore can also be used to describe operations (i.e. methods), but since Ecore is not an implementation language, it cannot contain the actual implementation of such operations, only a declaration of them. When using EMF for Java, the code generator will generate method stubs for the operations, and it is easy to fill in the implementation logic. When we use RGen and implement the model in Ruby, we typically do not generate code (although it is possible), and we must instead define the implementation of operations a different way if we get a model where operations are specified (and we want to provide the declared operations - there is no requirement to do so).

And finally, there is support for annotations. Any EModelElement can have annotations. RGen (and EMF) does not know what to do with these annotation other than knowing about the association. It is however an important mechanism for tools in a model toolchain, and they are typically used for things like generation of documentation, defining properties that are of importance for code generation, mark something as deprecated, etc. The use of an EAnnotation is often a light weight alternative to defining a completely separate model. There are some known annotation identities, say if we want to generate Java code from a model that we author with the RGen Metamodel Builder, and we would like JavaDoc comments to be generated in the Java source. (That is all that I planned to say about EAnnotations).


We get the EPackage from our model's Ruby Module:

MyModel.ecore   # => ECore::EPackage

The interesting operations on EPackage are:

method description type
eClasses The EClasses defined in this package Enumeration<ECore::EClass>
eClassifiers The EClassifiers defined in this package Enumeration<ECore::EClassifiers>
eAllClasses The EClasses defined in this and all super packages Enumeration<ECore::EClass>
eAllClassifiers The EClassifiers defined in this package and all super packages Enumeration<ECore::EClassifiers>

Remember that we are using a model; the ECore model, and we can naturally manipulate this model like any other model. In fact, we can build the model using Ruby logic, and then generate the implementation on the fly if we want. Thus, in addition to the methods shown above, there are methods to add and remove classifiers, and to get/set the attributes of the EPackage.


An EClass has many useful features:

feature description type
name The unqualified name, e.g. 'Car' String
qualifiedName The qualified name, e.g. 'MyModel::Car' String
eAttributes All attributes defined in this class. Enumeration<ECore::EAttribute>
eReferences All references (containment and regular) defined in this class Enumeration<ECore::EReference>
eAllAttributes eAttributes from this and all super classes Enumeration<ECore::EAttribute>
eAllContainments eReferences from this and all super classes that are containments Enumeration<ECore::EReference>
eAllReferences eReferences from this and all super classes Enumeration<ECore::EReference>
eAllStructuralFeatures all eAttributes and eReferences from this and all super classes Enumeration<ECore::EStructuralFeature>
eAllSubTypes all sub types of the EClass Enumeration<ECore::EClass>
eAllSuperTypes all super types of the EClass Enumeration<ECore::EClass>

And again, since this is in an ECore model, it is possible to manipulate the model, we can add / remove structural features and get/set the attributes.

In essence, the ECore model API makes it possible to do the same kind of meta programming that can be done in Ruby; dynamically creating classes with attributes and references. The differences are that what we create with ECore is type safe, and that we also get the model (which we can serialize, and later deserialize to get exactly the same implementation, or we can send it to another system using another platform, and it can in turn dynamically generate the logic that is needed to operate on this model). This is what makes modeling a corner stone for polyglot programming.


Simply something that has a formal name.


An ETypedElement is an ENamedElement, and adds the following:

feature description type
ordered (only for information stating that order is significant) Boolean
unique for multi valued attributes control if values are unique (references are always unique) Boolean
lowerBound the minimum number of occurances Integer
upperBound the maximum number of ocurrances Integer
many derived, if upperBound > 1 Integer
required derived, if lowerBound < 1 Integer
eType reference to the type EClass or EDataType

Concretely, we use these attributes when modeling an attribute:

has_many_attr 'nick_names', String, :lowerBound => 0, :upperBound => -1, :unique => true

This means no nick name, or as many as you like, and they are unique.


An EStrucuralFeature is an ETypedElement, and it is the base class for EAttribute and EReference, and it has attributes and operations that are common to both.

The attributes of interest are:

attribute description type
changeable can the value be externally set Boolean
volatile does the attribute have storage in the object (typically false for computed/derived values) Boolean
transient transient objects are omitted from serialization Boolean
defaultValueLiteral the default value in String form String
defaultValue the default value as instance of the attribute's data type Object
unsettable can the attribute be unset (see previous post) Boolean
derived is the attribute computed (requires implementing a method if true) Boolean

Again, by looking at the ECore model, you can find additional methods.

Concretely, we use these when defining attributes (typically only for attributes, but we may defined derived references that represent filtered sets of other references, we may define transient containments etc.)

has_attr 'reg_nbr', String, :defaultValueLiteral => "UNREGISTERED"

I will come back to how to define derived/computed features in a future post.


An EAttribute only adds a reference to EDataType (the type of the attribute), and the ability to denotes that it is a primary key/identity attribute (which can be useful if we are transforming the model and need to be able to determine a primary key for a class, but it is not generally needed).


As you have seen in earlier posts, references are either containment references (the diamond shaped relationships in the diagrams), or regular references (no diamond in the diagram). This is expressed with a Boolean attribute containment on the containing side, and container on the contained side (if the containment reference is bi-directional). When a reference is bi-directional, this is represented by two instances that know about each other via the eOpposite attribute.

If you study the model you will find that it is also possible to define that the relationship is based on a set of key attributes. (This is rarely used, but of value for models that should be easily transformed to database schemas).

Gratefully we do not have to deal with all these details when defining references using the RGen Metamodel Builder. You have already seen in the previous posts how easy it is to create references - filling in the containment/container, and eOpposite is done for us, but now you know about the structure that is actually build inside the Ecore model.

EMF tutorial, Fun with Graphic Modeling in Eclipse

If you are interested in playing with Ecore models and want to use a graphical tool to do so, you can checkout this tutorial by Vogella. (If you want to do this, it is best to start with an Eclipse IDE and not install it into your Geppetto (the Puppet IDE), as it will bloat your Geppetto with a complete Java development environment).

I almost always use the graphical tools at the start of a project to organize my ideas even if I later do not use modeling technology in the implementation. I find that it forces me to have good definitions of what things are. If I can not express it in a model, or the model turns out to be horribly ugly, then I probably do not understand the domain well enough (or the domain is horribly messy to begin with...)

Further Reading

If you are interested in more in-depth information about Ecore, you can checkout this EMF book, although being Java and Eclipse centric the Ecore part itself is generic.

Closing Remarks

I am sorry about all the theory and documentation flavor of this post. I wanted to give you enough details about what happens under the hood while not dragging it out for too long. As you probably have understood, you really do not need to think about all these things when just defining and using models as an implementation tool, but it is valuable to know that it is there and what it can do for you should you have the need for polyglot programming, code generation, generation of data base or data schemas, transformations of data, etc.

In posts to follow I will show how to add operations to the modeled classes for things like being able to store modeled objects in hashes, how to define derived attributes etc.

(Ok, you can breathe normally again).

Puppet Internals - Using Model Meta Data

Puppet Internals - Using Model Meta Data

In this post about modeling with Ecore and RGen I will explain the operations that allows us to navigate the model, and how to generically get and set elements / values in the model, as well as how to get to the meta-model from model objects.


As you will see, methods that operate on, or make use of Ecore meta data are typically named with an initial lower case 'e'. They also use Camel Cased names to make these operations as similar as possible to the corresponding operations in EMF / Java. Whenever you see such 'e' methods, they are about the Ecore aspects of the object (instance), or its class (access to the meta-model). In everyday speak we can talk about this as the object's e-ness (since it is much easier to say than "the meta aspects of..."

In this post I will show the various basic 'e' operations with examples. If you want to get the source of the model used in the examples, the final version is available in this gist.

Getting Content

One of the first things you typically want to do is to get the content of a model generically (that is, without any explicit knowledge about a particular model).

First, we need to make a couple of adjustments to the example model I showed in the previous post. RGen comes with some built in navigations, but not all that are available in the EMF (Java) implementation of Ecore. So, we have a module in Puppet that should be included to get full support.

We need to add:

require 'puppet'
require 'puppet/pops'

And then in our base class (that all our domain model classes inherit form) define it like this:

class MyModelElement < RGen::MetamodelBuilder::MMBase
  include Puppet::Pops::Containment

Getting All Contents

A common operation is to iterate over all containments in an object. This is done with the method eAllContents which is typically called with a block receiving each contained element. It can also be called without a block to get a Ruby Enumerable.

The eAllContents method does not include the object it is invoked on, only its content is included in the enumeration. It visits all contained recursively in depth first order where parents appear before children.

We can try this out with the Car model.

engine =
car =
car.engine = engine {|element| "#{element}" }

# => ["#<MyModel::CombustionEngine:0x007ff8a1b606d8>"]

In the Puppet implementation of the future parer, the ability to iterate over all contents is used when validating models. In particular the model produced by the parser.

Getting All Containers

If we have a model element, and would like to traverse all of its containers (until we reach the root) we use the eAllContainers. If we just want the immediate container we use eContainer.

# continuation of the previous example
engine.eContainer                       # MyModel::Car
engine.eAllContainers.to_a              # [MyModel::Car]

Not so exiting, but if we add a Garage that can contain cars - i.e. by adding this to the model:

class Garage < MyModelElement
  contains_many_uni 'cars', Car

And then add our car to the garage.

garage =
engine.eAllContainers.to_a              # [MyModel::Car, MyModel::Garage]

# and just to check what happens if we get all contents in the garage
garage.eAllContents.to_a                # [MyModel::Car, MyModel::CombustionEngine]

In the Puppet implementation of the future parser, the ability to search up the containment chain is used in validation (some object must be contained by top level constructs), and in order to find information such as a source text location index, and to find the loader that loaded the code (which is recorded at the root of the model).

Where am I? What is my role?

It is often useful to ask:

  • Where is this object contained?
  • What is this object's role in that container?

In the sample car model we have right now, this is not so valuable, since we do not have anything that can be contained in multiple places. So to make this a bit more interesting we can add the following to the model

class Car < MyModelElement
  # as before AND...
  has_attr 'reg_nbr', String, :defaultValueLiteral => 'UNREGISTERED'
  contains_one_uni 'left_front', Wheel
  contains_one_uni 'right_front', Wheel
  contains_one_uni 'left_rear', Wheel
  contains_one_uni, 'right_rear', Wheel

RimTypeEnum =[:black, :silver])

class Wheel < MyModelElement
  has_attr 'rim', RimTypeEnum
  has_attr 'rim_serial', String

The above example also shows how to set a default value for an attribute. Default values can only be used with single valued attributes. All other have an empty Array as their default.

Now we can create wheels and assign to the car. This also demonstrates that it is possible to give values to features in a Hash when creating the instance:

car = => 'ABC123')
car.left_front  = w1 = => '1', :rim => :silver)
car.right_front = w2 = => '2', :rim => :silver)
car.left_rear   = w3 = => '3', :rim => :silver)
car.right_rear  = w4 = => '4', :rim => :silver)

And now we can start asking questions:

w1.eContainer.reg_nbr    # => 'ABC123'
w1.eContainingFeature    # => :left_front
w2.eContainer.reg_nbr    # => 'ABC123'
w2.eContainingFeature    # => :right_front

In the Puppet implementation these operations are typically used when generating error messages. An error is found in some deeply nested / contained element and a message should inform the user about where it is being used / the role it plays. Sometimes also used in validation when something is valid or not depending on the role it plays in its container.

Generic Get

Features can be read generically with the method getGeneric. Using the wheel example above:

car.getGeneric(:left_front).rim_serial  # => '1'
w1.getGeneric(:rim)                     # => :silver

There is also a getGenericAsArray which returns the value in an Array, if it is not already one (which it always is when the feature is multi-valued).

The method eIsSet can be used to test if a feature has been set. This returns true if the value has any other value than the feature's default value (and if default value is not defined, if the feature is nil).

car2 =
car2.reg_nbr                 # => 'UNREGISTERED'
car2.eIsSet(:reg_nbr)        # => false
car2.reg_nbr = 'XYZ123'
car2.eIsSet(:reg_nbr)        # => true

This means we can define the value that represents "missing value" per single valued feature, and as you will see below we can reset the value to the default.

At present we do not make use of the generic operations anywhere in the Puppet implementation as the validation logic is aware of the individual classes. We may make use of it to add additional generic validation that is driven by meta data alone. We will probably also use this when we get to the catalog and resource type models.

Generic Mutating Operations

As you may have guessed, it is also possible to generically modify objects. The methods are:

  • setGeneric(name, value)
  • addGeneric(name, value)
  • removeGeneric(name, value)
  • eUnset(name) - sets feature to its default value (or nil/empty list if there is no default)

Here is an example using eUnset to return the car to the default for its reg_nbr:

carxyz123 = => 'XYZ123')
carxyz123.eIsSet(:reg_nbr)                          # => true
carxyz.reg_nbr                                      # => 'XYZ123'
carxyz123.eIsSet(:reg_nbr)                          # => false
carxyz.reg_nbr                                      # => 'UNREGISTERED'

So, what is missing from the picture?

You may have noticed it already. We did add an eAllContents, and eAllContainers to each model class by including the Puppet::Pops::Containment module, but these only operate on containment references.

  • How can you get all features including attributes and regular references?
  • Why are these not directly available on all model objects as individual methods?

The reasons for the design are that it is very common to navigate to the container, or to contained children e.g. for validation purposes, but these operations typically result ending up in logic that has specific knowledge about a particular class, and there we are in a context where we already know about all of the attributes and references and we can just use them directly.

While it would be possible to provide direct access to almost all e-ness methods directly, there is a limit to how much bloat we want in each class. Therefore, all other e-ness operations we may want to perform has to be written in a slightly more round-about fashion where we navigate to the meta model and get the information there ourselves.

All the information we need is available in the meta-model, and the following sections show how this is done.

Getting the Meta Model

We can get the entire meta-model from the Ruby module by calling the method ecore:

MyModel.ecore                  # => ECore::EPackage

From there we can navigate the contents of the entire package. As an example, one of the methods, eAllClasses, gives us all defined classes. Here is what happens if you try this in irb on MyModel (notice the map to each class' name to get something meaningful as output): {|ec| }
 => ["MyModelElement", "Engine", "CombustionEngine", "ElectricalEngine", "Car",
     "ServiceOrder", "Garage", "Wheel"]

We can also get to the meta model element for each individual class in our model by calling the method ecore on that object's class. (We can not use this on the values of attributes since they are basic Ruby types, and thus do not have any e-ness).

car =
car.class.ecore               # => RGen::ECore::EClass

Oh, look we got something back called an EClass which is the meta-model representation of a class. The EClass has many useful methods to get information about the class, its attributes, references, containments etc. As an example, the method eAllAttributes returns an Enumerable with ECore::EAttribute elements that describes all of the attributes for the class and all of its superclasses.

If you thought it just started to get interesting, don't worry, I will come back with more about about the ECore model in the next post.

In this Post

In this post you have seen how a model can be navigated, and how we can find the meta-data for elements in the model. In the next post I am going to dive deeper into the Ecore model itself.

Wednesday, April 30, 2014

Puppet Internals - Basic Modeling with RGen

Modeling Process - an example

In the previous post about modeling I covered the basic principles of models, i.e. - A model is an abstraction, a model that describes a model is a meta-model, and finally a model that defines the grammar for how meta-models are written is called a meta-meta model.

Since this may sound like hocus-pocus, I am going to show concrete examples of how this works. Before we can start doing tricks with meta-models, we must have something concrete to model, and we must learn the concepts to use to express these models, and that is the focus of this post.

I am going to use the RGen Metamodel Builder API, a Ruby DSL for creation of meta models to illustrate simply because it is the easiest way to create a model.

Modeling Process - an example

When we are modeling, we typically produce the model in several iterations - first just starting loosely with the nouns, verbs, and attributes of our problem domain. We recently did this exercise for a first version of Puppet's future Catalog Builder, and it went something like this:

  • We need to be able to build a Catalog
  • A Catalog contains Resources
  • A Resource describes a wanted state of a particular resource type in terms of Properties
  • A Resource also describes Parameters that are input to the resource provider performing the work
  • It would be great if we could define a catalog consisting of multiple catalogs to be applied in sequence - maybe Section is a good name for this
  • While the catalog is built we need to be able to make future references
  • We need to track where resources in the catalog originates (they may be virtual and exported from elsewhere)
  • ... etc.

Once we settled on a handful of statements, we could then whiteboard a tentative model, and reason about the implications. We continued with all the various sets of requirements, and we took pictures of the whiteboard, and made some written notes to remember what we did.

Next step was to document this more formally. I used the graphical modeling tool for Eclipse to make a diagram. Using this diagram, we then walked through it, continued discussing / testing use cases, and revising the diagram. The revised diagram after a couple of iterations looks like this:

At this point in the lifecycle of the model, it is fastest to make changes in the graphical tool, and it is much faster and easier to communicate what the model means than if everything is written in a programming language. After a while though, it becomes somewhat tedious to describe all the details in the diagram, and while we can use the output directly from the tool to get a version in Ruby, we really want to maintain the original model in Ruby source code form.

We then made an implementation of the model in Ruby that you can view on github.

The work of making the diagram and the implementation in Ruby took me something like 4 hours spread out over the breaks we took while discussing and white-boarding.

We expect to revise this model several times until it is done. If we want to generate a graph, we can now go in the other direction and create input to the graphical tool from the code we wrote in Ruby. This is basically a transformation from the Ruby code to Ecore in XMI since that is what the graphical tool understands.

The reason why I included this real life example is to show something that is relevant to the Puppet community. I am however going to switch to toy examples to demonstrate the various techniques when modeling since any real life model tends to overshadow the technique with domain specific issues - i.e. this post is not about the Catalog-model, it is about modeling in general.


The basic concepts used when modeling are:

  • We model classifiers / classes
  • Classes have named features
  • A feature is an attribute of a simple data type, or a relationship to another class
  • Attributes can be multi valued
  • A relationship can describe containment
  • A relationship can be uni-, or bi-directional (a bi-directional reference allows us to conveniently navigate to an object that is "pointing to" the object in question).
  • Relationships can be multi valued at one side (or both sides for non containment relationships).
  • The term multiplicity is sometimes used to denote optional/single/multi-value in relationships, e.g. 0:1, 1:1, 0:M, 1:M, or M:M
  • An abstract class can not be instantiated
  • Ecore also contains modeling of interfaces, this is mostly useful for Java and I am going to skip explaining these.


When using RGen's Meta Model Builder, the classes in a model are simply placed in a module and made to inherit the base class RGen::MetamodelBuilder::MMBase. It is common to let each meta model define an abstract class that signals that it is part of that model.

require 'rgen/metamodel_builder'

module MyModel
  # Let RGen know this module is a model
  extend RGen::MetamodelBuilder::ModuleExtension

  # An abstract class that makes it easier to check if a given
  # object is an element of "MyModel"
  class MyModelElement < RGen::MetamodelBuilder::MMBase
    # We make this class abstract to make it impossible to create instances of it

  class Car < MyModelElement
    # definition of Car

At this point we cannot really do much except create an instance of a Car

require 'mymodel'
a_car =

Attribute Features

Attribute features are used to define the attributes of the class that are represented by basic data types. Attributes have a name, a type, and multiplicity. A single value attribute is defined by calling has_attr, a multivalued attribute by calling has_many_attr.

The basic data types are:

  • String
  • Integer
  • Float
  • Numeric
  • Boolean
  • Enum

You can also use the completely generic Object type, but this also means that the model cannot be serialized, so this should be avoided. It is also possible to reference implementation classes, but this should also be avoided for serialization and cross platform reasons.

The Enum type can be specified separately, and given a name, or it may be defined inline.

EngineEnum =[:combustion, :electric, :hybrid])

class Car < MyModelElement
  has_attr 'engine', EngineEnum
  has_attr 'doors', Integer
  has_attr 'steering_wheel', String
  has_attr 'wheels', Integer

If we have an attribute that should be capable of holding many values, we use has_many_attr. In the example below, an enum for extra equipment is used - it does not have to be an enum, a multi valued attribute can have any basic data type.

ExtraEquipmentEnum =[
  :gps, :stereo, :sun_roof, :metallic
class Car < MyModelElement
  has_many_attr 'extras', ExtraEquipmentEnum
  # ...

The resulting implementation allows us to set and get values.

a_car =
a_car.engine = :combustion
a_car.extras = [:gps, :sun_roof]

For multi-valued attributes, we can add and remove entries

a_car.extras   # => [:gps, :metallic]

If we attempt to assign something of the wrong type, we get an error:


=> In MyModel::Car : Can not use a Symbol(:catapult_chair) where a
   [:gps,:stereo,:sun_roof,:metallic] is expected

We can also specify additional metadata for each attribute, but I will return to that later.

Relationship Features

All non basic data types are handled via references. There are two types of references; containment, and regular. A containment reference is used when the referenced element is an integral part of the object - e.g. when it is an attribute of the object, when something cannot and should not be shared between objects (one particular wheel can only be mounted on once car at the time), and when they should be serialized as part of the object that holds the reference. All other references requires that the referenced object is contained somewhere else (although this is not quite true as we will see later when we talk about more advanced concepts, it is a reasonable conceptual approximation for now).

In order to have something meaningful to model, lets expand the notion of Engine.

class Engine < MyModelElement

FuelTypeEnum =[:diesel, :petrol, :etanol])

class CombustionEngine < Engine
  has_attr 'fuel', FuelTypeEnum

class ElectricalEngine < Engine
  has_attr 'charge_time_h', Integer

# skipping HybridEngine for now


We can now change the Car to contain an Engine. When we do this, we must also decide if the Engine should explicitly know about which car it is mounted in or not. That is, if the containment relationship is uni- or bi- directional. Let's start with a uni-directional containment:

class Car < ModelElement
  contains_one_uni 'engine', Engine
  # ...

We can now create and assign an Engine to a Car.

a_car =
a_car.engine =

If we want to make the containment bi-directional:

class Car < MyModelElement
  contains_one 'engine', Engine, 'in_car'
  # ...

The assignment works as before, but we can now also navigate to the car the engine is mounted in. We achieved this, by defining the reverse role 'in_car' for the bi-directional containment. Now we can do this:

an_engine =
an_engine.in_car            # => nil
a_car =
a_car.engine = an_engine
an_engine.in_car            # => Car

The semantics of containment means that if we assign the engine to another car we will move it!

# continued from previous example
another_car =
a_car.engine                # => CombustionEngine
another_car.engine = an_engine
another_car.engine          # => CombustionEngine
a_car.engine                # => nil

This may seem scary, but it is actually quite natural. If we find that we want to contain something in more than one place at a given time our model (and thinking) is just wrong, and one of the references should not be a containment reference. Say if we have a ServiceOrder (imagine for the purpose of repairing an engine), the engine is not ever contained in the order (it is still mounted in the car). Model-wise we simply use a non-containment / regular reference from the ServiceOrder to an Engine.

We specify different kinds of containment with the methods:

  • contains_one
  • contains_many
  • contains_one_uni
  • contains_many_uni

Regular References

Regular references are defined with one of the methods:

  • has_one, uni-directional
  • has_many, uni-directional
  • one_to_many, bi-directional
  • many_to_one, bi-directional
  • many_to_many, bi-directional

We can now define a ServiceOrder for service of engines:

class ServiceOrder < MyModelElement
  has_one 'serviced_engine', Engine

And we can use this:

an_engine =
a_car =
a_car.engine = an_engine
a_car.engine                   # => CombustionEngine
so =
so.serviced_engine = an_engine
a_car.engine                   # => CombustionEngine

As you can see, since the relationship is non-containment, the engine does not move to become an integral part of the service order, it is still mounted in the car (which is exactly what we wanted).

For the bi-directional references, the reverse role is required. When using these, it is important to consider the cohesion of the system - we do not want every piece to know about everything else. Therefore, choose bi-directional references only where it really matters.

Testing the examples

You can easily try the examples in this gist. You need to have the rgen gem installed, and then you can run the examples in irb, and try things out.

In this Post

In this post you have seen examples of how to build a model with RGen, and how it can be used. As you probably have noticed, you get quite a lot of functionality with only a small amount of work. How much code would you have to write to correctly support fully typed many to many relationships? (And then discover that you want to model it differently).

I hope this post has showed the usefulness of modeling even when no fancy modeling tricks have been used simply because of the robust, type and referentially safe implementation that we get from a small an concise definition.

Monday, April 28, 2014

Puppet Internals - Introduction to Modeling

What is a "model" really?

As you probably know, Puppet creates a Declarative Catalog that describes the desired system state of a machine. What you probably have not thought about is that this means that Puppet is actually Model Driven. The model in this case is the catalog and the descriptions of the desired state of a set of resources that is built by Puppet from the instructions given to it in the Puppet Programming Language.

While Puppet has always been based on this model, it has not been built using modeling technology. What is new in the future parser and evaluator features in Puppet is that we have started using modeling technology to also implement how Puppet itself works.

In this post, I am going to talk a bit about modeling in general, the Ecore modeling technology and the Ruby implementation of Ecore called RGen.

What is a "model" really?

If at first when you hear the word model, you think about Tyra Banks or Marcus Schenkenberg and then laugh a little because that is obviously not the kind of models we are talking about here. You were actually not far off the mark.

A model is just an abstraction, and we use them all the time - they are the essence of any spoken language; when we talk about something like a "Car" we are talking about an unspecific instance of something like "a powered means of transportation, probably a 4 door sedan". Since it is not very efficient to communicate in that way, we use the abstraction "Car". Fashion models such as Tyra or Marcus are also abstractions of "people wearing clothes" albeit very good looking ones (perhaps you think they represent "people that do not wear enough clothes", but they are still abstractions).

We can express a model concretely using natural language:

A Car has an engine, 2 to 5 doors, a steering wheel, breaks, and 3-4 wheels.
. . .

We can also express such a model in a programming language such as Ruby:

class Car
  attr_accessor :engine
  attr_accessor :doors
  attr_accessor :steering_wheel
  attr_accessor :breaks
  attr_accessor :wheels

. . .

As you can see in the above attempt to model a Car we lack the semantics in Ruby to declare more details about the Car's attributes - there is no way to declaratively state how many doors there could be, the number of wheels etc. There is also no way to declare that the engine attribute must be of Engine type, etc. All such details must be implemented as logic in the setter and getter methods that manipulate a Car instance. While this is fine from a runtime perspective (it protects us from mistakes when using the class), we can not (at least not easily) introspect the class and deduce the number of allowed doors, or the allowed type of the various attributes.

Introspection (or reflection) is the ability to programmatically obtain information about the expressed logic. Sometimes we talk about this as the ability to get meta-data (data about data) that describes what we are interested in.

While Ruby is a very fluid implementation language, in itself it is not very good at expressing a model.

Modeling Language

A modeling language (in contrast to an implementation language such as Ruby) lets us describe far more details about the abstraction without having to express them in imperative code in one particular implementation language. One such family of "languages" are those that are used to describe data formats - they are referred to a schemas - and you are probably familiar with some of them such as XmlSchema, JsonSchema, and Yamlschema. These schema technologies allows us to make declarations about what is allowed in data that conforms to the schema and we can use this to validate actual data.

A schema is a form of modeling language. What is interesting about them is that they enable transformation from one form to another! Given an XmlSchema, it is possible to transform it into a corresponding Yaml or Json schema, and likewise transform data conformant with one such schema into data conformant with the transformed schema. (The problems doing this in practice has to do with the difference in semantic power between the different technologies - we may be able to express rules/constraints in one such schema technology that does not exist the others).

Schemas and Meta Models

When we look at a schema - we are actually looking at a meta model; a model that describes a model. That is if we describe a Car in Json we have a Car model:

{ "engine": "combustion",
  "steering-wheel": "sport-leather",
  "wheels": ...

And if we describe a schema for it:

{ "title": "Car Schema",
  "type"; "object",
  "properties": {
    "engine": { "type": "string"},
    "steering-wheel": { "type": "string" },
    . . .
  "required": ["engine", "steering-wheel", ...]

We have a Car meta-model.

In everyday speak, we typically refer to the schema as "schema" or "model" and simply ignore its "meta status". But since we are on the topic of meta models - what we can do now is to also express the meta model as a model - i.e. what is the schema for a jsonschema? Here is an excerpt from the Json "Core/Validation Meta-Schema"

  "id": "",
  "$schema": "",
  . . .
   "title": {
        "type": "string"
  . . .
  "required": { "$ref": "#/definitions/stringArray" },
  . . .

If you are interested in what it looks like, do download it. Be warned that you will quickly become somewhat disoriented since it is a schema describing a schema that describes what Json data should look like...

A meta schema such as that for JsonSchema is very useful as it can be used to validate schemas that describe data.

Schemas such as XmlSchema, YamlSchema, and JsonSchema are good for describing data, but they becomes somewhat difficult to use in practice for the construction of software. There are other modeling languages that are more specifically targeting software system constructs. There are both graphical and textual languages as well as those that have both types of representations.

What we are specifically interested in is an Object modeling language, which I will explain in more details.

Levels of reality / meta-ness

We can organize what we just learned about a model and its relationship to a meta-model, and that all of these can be expressed as a model. Here is a table that illustrates this from most concrete to most abstract:

Meta-Level Description
M0 Real Object - e.g. the movie "Casablanca on a DVD"
M1 User Model / Instance Level - e.g. a computer abstraction of the DVD - aVideo ='Casablanca')
M2 Meta Model - e.g. defines what Video is, its attributes and operations
M3 Meta meta model (the modeling language/grammar) - e.g. defines how the definition of "what a Video is", is expressed.

We very rarely talk about meta-meta models (this is the technology we use to implement a meta-model), and we also typically leave out the meta word when talking about a meta-model (i.e. just using the word model). We also typically talk about an "instance of a model" as being just a "model", without any distinction about its form; as live objects in memory that we can manipulate, or serialized to disk, stored in a database etc. It is only when we talk explicitly about modeling and modeling technology that we need to use the more precise terms, most of the time, it is perfectly clear what we are referring to when we use the word "model".

Object Modeling Language

An Object Modeling Language is a language that directly supports the kinds of elements we are interested in when constructing software. E.g. classes, methods, the properties of objects. You probably heard of one such technology called UML - Unified Modeling Language - this is a broad modeling technology and is associated with object oriented software development methodologies such as Booch, OMT, Objectory, IBM's RUP, and the Dynamic Systems Development Method. This was The Big Thing in the 90's, but UML has since then more or less slid into darkness as "the way to write better software" has shifted focus. An interesting debate from 2009, can be found here.

There is however a very useful part of the UML technology that is often overlooked - the so called Meta Object Facility (MOF) that sits at the very core of UML. It contains the (meta-meta) model that UML itself is defined in. MOF plays the same role for models as what Extended Backus Naur Form (EBNF) plays for programming languages - it defines the grammar. Thus MOF can be said to be a domain specific language used to define meta models. The technology used in MOF is called Ecore - and it is the reference implementation of MOF. (It is a model at level M3 in the table above).


Ecore is part of Eclipse EMF, and is heavily used within Eclipse for a wide variety of IDE applications and application development domains. In the Puppet Domain EMF/Ecore technology is used in the Puppetlabs Geppetto IDE tool in combination with additional frameworks for language development such as Xtext.

Eclipse EMF is a Java centric implementation of Ecore. There are also implementations for Ruby (RGen, and C++ EMF4CPP).

Thus, there are many different ways to express an Ecore model. The UML MOF has defined one serialization format known as XMI, which is based on XML, but there are many other concrete formats such as Xcore (a DSL built with Xtext, annotated Java, JSon, binary serialization formats, the Rgen DSL in Ruby, etc.)

Car in RGen

Here is the Car expressed with RGen's MetamodelBuilder. In the next blog post about modeling I will talk a lot more about RGen - this is just a simple illustration:

class Car < RGen::MetamodelBuilder::MMBase
  has_attr 'engine',[:combustion, :electric, :hybrid])
  has_attr 'doors', Integer
  has_attr 'steering_wheel', String
  has_attr 'wheels', Integer

. . .

A few points though:

  • RGen::MetamodelBuilder::MMBase is the base class for all models implemented with RGen (irrespective of how they are defined; using the metamodel builder, using the API directly, loading an ecore XMI file, or any other serialization format).
  • has_attr is similar to Ruby's attr_accessor, but it also specifies the type and type checking is automatic.
  • You can probably guess what Enum does

If you are eager to see real RGen models as they are used in Puppet, you can take a look at the AST model, or the type model.

Benefits of Modeling Technology

So what is it that is appealing about using modeling technology?

  • The model is declarative, and can often be free of implementation concerns
  • Polyglot - a model can be used dynamically in different runtimes, or be used to generate code.
  • Automatic type checking
  • Support for containment and serialization
  • Models can be navigated
  • Objects in the model has information (meta-data) about where they are contained ("this is the left front wheel of the car XYZ 123").
  • Direct support for relationships, including many to many, and bidirectional relationships

Which I guess boils down to "Modeling technology removes the need to write a lot of boilerplate code for non trivial structures" In the following posts I will talk about these various benefits and concepts of Ecore using examples in RGen.

In this Blog Post

In this blog post I explained that a model is an abstraction, and how such an abstraction can be implemented in software i.e. hand written, or using modeling technology such as Data Schemas or one specifically designed for software such as Ecore which available for Ruby in the form of the RGen gem. I also dived into the somewhat mysterious domain of meta-meta models - the grammar used to describe a meta-model, which in turn describes something that we want be manipulate/work with in our system.

Things will be more concrete in the next post, I promise.

Monday, April 7, 2014

Getting your Puppet Ducks in a Row

Getting your Puppet Ducks in a Row

A conversation that comes up frequently is if the Puppet Programming Language is declarative or not. This is usually the topic when someone has been fighting with how master side order of evaluation of manifests works and have left someone beaten by what sometimes may seem as random behavior. In this post I want to explain how Puppet works and try to straighten out some of the misconceptions.

First, lets get the terminology right (or this will remain confusing). It is common to refer to "parse order" instead of "evaluation order" and the use of the term "parse order" is deeply rooted in the Puppet community - this is unfortunate as it is quite misleading. A computer language is typically first parsed and then evaluated (Puppet does the same), and as you will see, almost all of the peculiarities occur during evaluation.

"Parse Order"

Parse Order is the order in which Puppet reads puppet manifests (.pp) from disk, turns them into tokens and checks their grammar. The result is something that can be evaluated (technically an Abstract Syntax Tree (AST)). The order in which this is done is actually of minor importance from a user perspective, you really do not need to think about how an expression such as $a = 1 + 2 becomes an AST.

The overall ordering of the execution is that Puppet starts with the site.pp file (or possibly the code setting in the configuration), then asks external services (such as the ENC) for additional things that are not included in the logic that was loaded from the site.pp. In versions from 3.5.1 the manifest setting can also refer to a directory of .pp files (preferred over using the now deprecated import statement).

After having parsed the initial manifest(s), Puppet then matches the information about the node making a request for a catalog with available node definitions, and selects the first matching node definition. At this point Puppet has the notion of:

  • node - a mapping of the node the request is for.
  • a set of classes to include and possibly parameters to set that it got from external sources.
  • parsed content in the form of one or several ASTs (one per file that was initially parsed)

Evaluation of the puppet logic (the ASTs) now starts. The evaluation order is imperative - lines in the logic are executed in the order they are written. However, All Classes and Defines in a file are defined prior to starting evaluation, but they are not evaluated (i.e. their bodies of code are just associated with the respective name and set aside for later "lazy" evaluation).

Which leads to the question what "being defined" really means.

Definition and Declaration

In computer science these terms are used as follows:

  • Declaration - introduces a named entity and possibly its type, but it does not fully define the entity (its value, functionality, etc.)
  • Definition - binds a full definition to a name (possibly declared somewhere). A definition is what gives a variable a value, or defines the body of code for a function.

A user-defined resource type is defined in puppet using a define expression. E.g. something like this:

define mytype($some_parameter) {
  # body of definition

A host class is defined in puppet using the class expression. E.g. something like this:

class ourapp {
  # body of class definition

After such a resource type definition or class definition has been made, if we try to ask whether mytype or ourapp is defined by using the function defined, we will be told that it is not! This is because the implementer of the function defined used the word in a very ambiguous manner - the defined function actually answers "is ourapp in the catalog?", not "do you know what a mytype is?".

The terminology is further muddled by the fact that the result of a resource expression is computed in two steps - the instruction is queued, and later evaluated. Thus, there is a period of time when it is defined, but what it defines does not yet exist (i.e. it is a kind of recorded desire / partial evaluation). The defined function will however return true for resources that are either in the queue or have been fully evaluated.

 mytype { '/tmp/foo': ...}
 notice defined(Mytype['tmp/foo'])  # true

When this is evaluated, a declaration of a mytype resource is made in the catalog being built. The actual resource '/tmp/foo' is "on its way to be evaluated" and the defined function returns true since it is (about to be) "in the catalog" (only not quite yet).

Read on to learn more, or skip to the examples at the end if you want something concrete, and then come back and read about "Order of Evaluation".

Order of Evaluation

In order for a class to be evaluated, it must be included in the computation via a call to include, or by being instantiated via the resource instantiation expression. (In comparison to a classic Object Oriented programming language include is the same as creating a new instance of the class). If something is not included, then nothing that it in turn defines is visible. Also note that instances of Puppet classes are singletons (a class can only be instantiated once in one catalog).

Originally, the idea was that you could include a given class as many times you wanted. (Since there can only be one instance per class name, multiple calls to include a class only repeats the desire to include that single instance. There is no harm in this). Prior to the introduction of parameterized classes, it was easy to ensure that a class was included; a call to 'include' before using the class was all all that was required. Parameterized classes were then introduced, along with new expression syntax allowing you to "instantiate class as a resource". When a class is parameterized, the “signature” of the class is changed by the values given to the parameters, but the class name remains the same. (In other words, ourapp(“foo”) has a different signature than ourapp(42), even though the class itself is still ourapp.) Parameterization of classes therefore implies that including a class only works when that class does not have multiple signatures. This is because multiple signatures would require multiple singleton instantiations of the same class (a logical impossibility). Unfortunately puppet cannot handle this even if the parameter values are identical - it sees this as an attempt of creating a second (illegal) instance of the class.

When something includes a class (or uses the resource instantiation expression to do the same), the class is auto-loaded; this means that puppet maps the name to a file location, parses the content, and expects to find the class with a matching name. When it has found the class, this class is evaluated (its body of code is evaluated).

The result of the evaluation is a catalog - the catalog contains resources and edges and is declarative. The catalog is transported to the agent, which applies the catalog. The order resources are applied is determined by their dependencies as well as their containment, use of anchor pattern, or the contain function, and settings (apply in random, or by source order, etc.). No evaluation of any puppet logic takes place at this point (at least not in the current version of Puppet) - on the agent the evaluation is done by the providers operating on the resource in the order that is determined by the catalog application logic running on the agent.

The duality of this; a mostly imperative, but sometimes lazy production (as you will learn below) of a catalog and a declarative catalog application is something that confuses many users.

As an analog; if you are writing a web service in PHP, the PHP logic runs on the web server and produces HTML which is sent to the browser. The browser interprets the HTML (which consists of declarative markup) and decides what to render where and the order in which rendering will take place (images load in the background, some elements must be rendered first because their size is needed to position other elements etc.). Compared to Puppet; the imperative PHP backend corresponds to the master computing a catalog in a mostly imperative fashion, and an agent's application of the declarative catalog corresponds to the web browser's rendering of HTML.

Up to this point, the business of "doing things in a particular order" is actually quite clear; the initial set of puppet logic is loaded, parsed and evaluated, which defines nodes (and possibly other things), then the matching node is evaluated, things it references are then autoloaded, parsed and evaluated, etc. until everything that was included has been evaluated.

What still remains to be explained is the order in which the bodies of classes and user-defined types are evaluated, as well as when relationships (dependencies between resources) and queries are evaluated.

Producing the Catalog

The production of the catalog is handled by what is currently known as the "Puppet Compiler". This is again a misnomer, it is not a compiler in the sense that other computer languages have a compiler that translates the source text to machine code (or some intermediate form like Java Byte Code). It does however compile in the sense that it is assembling something (a catalog) out of many pieces of information (resources). Going forward (Puppet 4x) you will see us referring to Catalog Builder instead of Compiler - who knows, one day we may have an actual compiler (to machine code) that compiles the instructions that builds the catalog. Even if we do not, for anyone that has used a compiler it is not intuitive that the compiler runs the program, which is what the current Puppet Compiler does.

When puppet evaluates the AST, it does this imperatively - $a = $b + $c, will immediately look up the value of $b, then $c, then add them, and then assign that value to $a. The evaluation will use the values assigned to $b and $c at the time the assignment expression is evaluated. There is nothing "lazy" going on here - it is not waiting for $b or $c to get a value that will be produced elsewhere at some later point in time.

Some instructions have side effects - i.e. something that changes the state of something external to the function. This is in contrast to an operation like + which is a pure function - it takes two values, adds them, and produces the result - once this is done there is no memory of that having taken place (unless the result is used in yet another expression, etc. until it is assigned to some variable (a side effect).

The operations that have an effect on the catalog are evaluated for the sole purpose of their side effect. The include function tells the catalog builder about our desire to have a particular class included in the catalog. A resource expression tells the catalog builder about our desire to have a particular resource applied by the agent, a dependency formed between resources again tells the catalog builder about our desire that one resource should be applied before/after another. While the instructions that cause the side effects are immediate, the side effects are not completely finished, instead they are recorded for later action. This is the case for most operations that involve building a catalog. This is what we mean when we say that evaluation is lazy.

To summarize:

  • An include will evaluate the body of a class (since classes are singletons this happens only once). The fact that we have instantiated the class is recorded in the catalog - a class is a container of resources, and the class instance is fully evaluated and it exists as a container, but it does not yet containe the actual resources. In fact, it only contains instructions (i.e. our desire to have a particular resource with particular parameter values to be applied on the agent).
  • A class included via what looks like a resource expression i.e. class { name: } behaves like the include function wrt. evaluation order.
  • A dependency between two (or a chain of) resources is also instructions at this point.
  • A query (i.e. space-ship expressions) are instructions to find and realize resources.

When there are no more expressions to immediately evaluate, the catalog builder starts processing the queued up instructions to evaluate resources. Since a resource may be of user-defined type, and it in turn may include other classes, the processing of resources is interrupted while any included classes are evaluated (this typically adds additional resource instructions to the queue). This continues until all instructions about what to place in the catalog have been evaluated (and nothing new was added). Now, the queue is empty.

The lazy evaluation of the catalog building instructions are done in the order they were added to the catalog with the exception of application of default values, queries, and relations which are delayed until the very end. (Exactly how these work is beyond the topic of this already long blog post).

How many different Orders are there?

The different orders are:

  • Parse Order - a more or less insignificant term meaning the order in which text is translated into something the puppet runtime can act on. (If you have a problem with ordering, you are almost certainly not having a problem with parse order).
  • Evaluation Order - the order in which puppet logic is evaluated with the purpose of
    producing a catalog. Pure evaluation order issues are usually related to the order in which arguments are evaluated, the order case options are evaluated - these are usually not difficult to figure out.
  • Catalog Build Order - the order in which the catalog builder evaluates definitions. (If you are having problems with ordering, this is where things appears to be mysterious).
  • Application Order - the order in which the resources are applied on an agent (host). (If you are having ordering problems here, they are more apparent, "resource x" must come before "resource y", or something (like a file) that "resource y" needs will be missing). Solutions here are to use dependencies, the anchor pattern, or use the contain function.)

Please Make Puppet less Random!

This is a request that pops up from time to time. Usually because someone has blown a fuse over a Catalog Build Order problem. As you have learned, the order is far from random. It is however, still quite complex to figure out the order, especially in a large system.

Is there something we can do about this?

The mechanisms in the language have been around for quite some time, and they are not an easy thing to change due to the number of systems that rely on the current behavior. However, there are many ways around the pitfalls that work well for people creating complex configurations - i.e. there are "best practices". There are also some things that are impossible or difficult to achieve.

Many suggestions have been made about how the language should change to be both more powerful and easier to understand, and several options are being considered to help with the mysterious Catalog Build Order and the constraints it imposes. These options include:

  • Being able to include a resource multiple times if they are identical (or that they augment each other).
  • If using a resource expression to instantiate a class, consider a previous include of that class to be identical (since the include did not specify any parameters it can be considered as a desire of lower precedence). (The reverse interpretation is currently allowed).

Another common request is to support decoupling between resources, sometimes referred to as "co-op", where there is a desire to include things "if they are present" (as oppose to someone explicitly including them). The current set of functions and language mechanisms makes this hard to achieve (due to Catalog Build Order being complex to reason about).

Here the best bet is the ENC (for older versions), or the Node Classifier for newer Puppet versions. Related to this is the topic of "data in modules", which in part deals with the overall composition of the system. The features around "data in modules" have not been settled while there are experimental things to play with - none of the existing proposals is a clear winner at present.

I guess this was a long way of saying - we will get to it in time. What we have to do first (and what we are working on) is the semantics of evaluation and catalog building. At this point, the new evaluator (that evaluates the AST) is available when using the --parser future flag in the just to be released 3.5.1. We have just started up the work on the new Catalog Builder where we will more clearly (with the goal of being both strict and deterministic) define the semantics of the catalog and the process that constructs it. We currently do not have "inversion of control" as a feature under consideration (i.e. by adding a module to the module path you also make its starting point included), but are well aware that this feature is much wanted (in conjunction with being able to compose data).

What better way to end than with a couple of examples...

Getting Your Ducks in a Row

Here is an example of a manifest containing a number of ducks. In which order will they appear?

define duck($name) {
  notice "duck $name"
  include c

class c {
  notice 'in c'
  duck { 'duck0': name => 'mc scrooge' }

class a {
  notice 'in a'
  duck {'duck1': name => 'donald' }
  include b
  duck {'duck2': name => 'daisy' }

class b {
  notice 'in b'
  duck {'duck3': name => 'huey' }
  duck {'duck4': name => 'dewey' }
  duck {'duck5': name => 'louie' }

include a

This is the output:

Notice: Scope(Class[A]): in a
Notice: Scope(Class[B]): in b
Notice: Scope(Duck[duck1]): duck donald
Notice: Scope(Class[C]): in c
Notice: Scope(Duck[duck3]): duck huey
Notice: Scope(Duck[duck4]): duck dewey
Notice: Scope(Duck[duck5]): duck louie
Notice: Scope(Duck[duck2]): duck daisy
Notice: Scope(Duck[duck0]): duck mc scrooge

(This manifest is found in this gist if you want to get it and play with it yourself).

Here is a walk through:

  • class a is included and its body starts to evaluate
  • it placed duck1 - donald in the catalog builder's queue
  • it includes class b and starts evaluating its body (before it evaluates duck2 - daisy)
  • class b places ducks 3-5 (the nephews) in the catalog builder's queue
  • class a evaluation continues, and duck2 - daisy is now placed in the queue
  • the immediate evaluation is now done, and the catalog builder starts executing the queue
  • duck1 - donald is first, when it is evaluated the name is logged, and class c is included
  • class c queues duck0 - mc scrooge
  • catalog now processes the remaining queued ducks in order 3, 4, 5, 2, 0

The order in which resources are processed may seem to be random, but now you know the actual rules.


In this (very long) post, I tried to explain "how puppet master really works", and while the order in which puppet takes action may seem mysterious or random at first, it is actually both defined and deterministic - albeit quite unintuitive when reading the puppet logic at "face value".

Big thanks to Brian LaMetterey, and Charlie Sharpsteen who helped me proof read, edit, and put this post together. Any remaining mistakes are all mine...