Sunday, February 1, 2015

Puppet 4.0 Data in Modules part II - Writing a Data Provider

In Puppet 4.0.0 there is a new technology agnostic mechanism that makes it possible to provide default values for class parameters in modules and in environments. In the first post about this feature I show how it is used. In this second post I will show how to write and deliver an implementation of a data provider.

The information in this post is only relevant if you are planning to extend puppet with additional types of data providers - you do not need to learn all that is presented here to use the services the new data provider feature provides.

How does it work?

The new data provider feature is built using the Puppet Binder which wires (i.e. binds) the various parts together in a composable way. You do not really need to know all the features of the Puppet Binder to be able to use it as the bindings needed for the data providers are mostly boilerplate and you just have to copy/paste and replace the example names with the names of things in your implementation.

The feature has two different kinds of data providers; one for environments, and one for modules. The steps to implement them are almost the same, so I am going to show both at the same time.

What you need to do:

  • Implement the data provider(s). They have a very simple API - basically just a method named lookup.
  • Register the bindings that makes your data provider implementations available for use.

Implementing the Data Providers

Data providers are implemented as Ruby classes. The two classes (one for environments, and one for modules) have a very simple API - basically they must inherit from the correct base class (as shown in the examples below), and they must implement the method lookup(name, scope, merge).

In this example, I am creating data providers that users will know by the name 'sample'. There will be a provider called 'sample' that can be used for the environment, and one that can be used for modules. This will be made available in a module that I am going to name 'sampledata'.

For use in environments

# <modulepath>sampledata/lib/puppet_x/author/sample_env_data.rb
require 'puppet_x'
module PuppetX::Author
  class SampleEnvData < Puppet::Plugins::DataProviders::EnvironmentDataProvider
    def lookup(name, scope, merge)
      # return the value bound to the name/key

For use in modules

# <modulepath>sampledata/lib/puppet_x/author/sample_module_data.rb
require 'puppet_x'
module PuppetX::Author
  class SampleModuleData < Puppet::Plugins::DataProviders::ModuleDataProvider
    def lookup(name, scope, merge) 
      # return the value bound to the name/key


  • The data provider API guarantees that calls to lookup only occurs for an environment that has
    opted in by setting the environment_data_provider to the key 'sample', and for a module that has opted in with a binding of 'sample' to the 'module_data' (just like we used 'function' in the earlier examples in the previous post on this topic).

  • The PuppetX namespace is available for 3d party Ruby code. When using it, it should be followed by the name of the author (as defined by the Puppet Forge for modules) - i.e. in your code replace Author with your name.

  • When the implementations are loaded by the runtime, the data provider base classes have already been loaded, so there is no need to require 'puppet'.

  • The merge parameter is a string of type Enum[unique, hash, merge] or a hash with the key 'strategy' set to that string with additional keys that control the merge in detail (see the documentation of the lookup function). (In the sample implementation this parameter is ignored since it can only supply one value per key).

There are more things to say how to implement the lookup to make it efficient. More about that later after I have showed how to wire the implementation into puppet.

Registering the Data Provider Implementations

The first thing is to register the bindings that makes it possible for other modules (or an environment) to declare that our new implementation should be used.

The Puppet Binder loads bindings from modules. By default the file <moduleroot>/lib/puppet/bindings/<modulename>/default.rb is loaded (if it exists). In this file, we need to create the bindings we want.

Since we have an implementation for both environment, and modules, the registration looks like this:

# <modulepath>sampledata/lib/puppet/bindings/sampledata/default.rb
Puppet::Bindings.newbindings('sampledata::default') do
  bind {
    name         'sample'                             # the name
    in_multibind 'puppet::environment_data_providers' # boilerplate (for env)
    to_instance  'PuppetX::Author::SampleEnvData'     # the classname as a string
  bind {
    name          'sample'                            # the name
    in_multibind  'puppet::module_data_providers'     # boilerplate (for module)
    to_instance   'PuppetX::Author::SampleModuleData' # the classname as a string

As before, replace Author with your name, and replace 'sample' with the name you want to give your bindings provider. The to_instance references should be the fully qualified class names of the implementations of the data providers.

The two bindings, registers the respective implementation class with a symbolic name, which allows users to use this name instead of the more complicated class name of the data provider class we have implemented.

As there can be many implementations available and active at the same time, the Puppet Binder's multibind capability is used to bind the implementation for a given "extension point" (e.g 'puppet::environment_data_providers').


  • The name you give your implementation must be unique among all implementations of the same type so you should really prefix the name with the module name to be safe.

Using the Implementations

As shown in the previous post, using a data provider implementation is simple. The examples in this post adds a provider named 'sample'; so simply change the use of 'function' in the previous post's examples to switch to the providers we just implemented.

The lifecycle of Data

The implementation of lookup probably needs to cache information (e.g. if we were writing an implementation for hiera it could be reading and caching the hiera.yaml file, and various data files).

Caching is somewhat complicated since we need to associate the cached data with something that has the same lifecycle as the data - we do not want to hold on to information that is stale and just occupies memory until Puppet's master process is restarted.

There are two things that it makes sense to associate a cache with:

  • the environment, if the data is static for the entire life of the environment. An environment goes out of scope when it times out (a configurable amount of time).
  • the compiler, if the data is static for the compilation (but varies from request to request for different nodes in the same environment instance). The compiler goes out of scope and the end of each catalog compilation.

It is not suitable to associate the cache with the data provider instance itself (e.g. in a class or instance variable in SampleModuleData).

The absolute best way of doing this is to use an Adapter. There is no reusable implementation of a caching adapter and the implementor of a data provider should design one for the specific purpose of handling its caching needs. This can be as simple as in this example:

class PuppetX::Author::MyCacheAdapter < Puppet::Pops::Adaptable::Adapter
  attr_accessor :cache

The provider implementation then associates the adapter with either the environment, or the compiler. the implementation can naturally have as many instance variables as it needs (the one in the example just has a cache variable), and additional methods. (If you want to look at a real implementation, the 'function' data provider built into Puppet 4.0 has a class called Puppet::DataBindings::DataAdapter that serves as a cache as well as performing the calls to the data functions).

The approach of using adapters is much preferred over monkey patching existing code. For more information about adapters - see my blog post on the topic).

It is simple to use the adapter - here are examples for associating one with the environment, and the compiler.

adapter = MyCacheAdapter.adapt(Puppet.lookup(:current_environment))
cached = adapter.cache()

adapter = MyCacheAdapter.adapt(scope.compiler)
cached = adapter.cache()

I am stopping there, since what you need to cache and how will be specific to what you are implementing support for.

General notes about caching data content

Do not implement file watching. Directory environments use a stable state for the given timeout and everything is evicted when the environment times out. Since there can be a very large number of directory environments (users have reported using several hundred, e.g. for a master running various development branches), and directory environments may also be quite volatile. If you are not using the adapter approach to caching, you must ensure that your caching does not leak memory by binding stale data for environments that potentially never will be used again during the running process' life cycle.

Experiment with the Sample in Puppet's code base

There are two test data fixtures in Puppet's code base (used when running spec test) that you can also run from the command line. You can naturally make a copy of them for your own experiments (if you do not want to type in the examples in this blog post from scratch).

The 'function' example

The first tests the function data provider, and can be invoked like this (all on one line):

bundle exec puppet apply
--environment=production -e 'include abc'

The fixture has a parameterized classes. One that is not in a module, and one in a module. The module class gets two of its three parameters overridden by environment data.

You should see this printout

Notice: env_test1
Notice: /Stage[main]/Abc::Def/Notify[env_test1]/message: defined 'message' as 'env_test1'
Notice: env_test2
Notice: /Stage[main]/Abc::Def/Notify[env_test2]/message: defined 'message' as 'env_test2'
Notice: module_test3
Notice: /Stage[main]/Abc::Def/Notify[module_test3]/message: defined 'message' as 'module_test3'

The 'sample provider' example

The second example can be run like this (all on one line):

bundle exec puppet apply

This fixture uses parameterized classes and use an implementation of the sample providers shown in this blog post but with lookup functions that return hard coded values for the classes in the fixture.

You should see this printout:

Notice: env data param_a is 10, env data param_b is 20, 3
Notice: /Stage[main]/Test/Notify[env data param_a is 10, env data param_b is 20, 3]/message: defined 'message' as 'env data param_a is 10, env data param_b is 20, 3'
Notice: module data param_a is 100, module data param_b is 200, env data param_c is 300
Notice: /Stage[main]/Dataprovider::Test/Notify[module data param_a is 100, module data param_b is 200, env data param_c is 300]/message: defined 'message' as 'module data param_a is 100, module data param_b is 200, env data param_c is 300'