Friday, August 21, 2015

Upgrading to Puppetlabs-Apt 2.0

The Puppetlabs Apt module went through a major change earlier this year. It crossed a semver boundary and released as 2.0. This is one of the only cases we've had as a community where a core module has moved over a major version. The initial reaction to Apt 2.0 was everyone quickly pinning their modules to use < 2.0. Morgan, Daenney and the puppetlabs modules team quickly pushed out a 2.1.0 release which is backwards compatible with some core functionality inside the Apt module. It is important to note that not everything is backwards compatible, only a few things.

At OpenStack-Infra, we wanted to use the latest version of bfraser's graphana module but it requires apt >= 2.0. Paul spun up a change to our main repository and then several more changes to move to the new syntax. Here is an example.

Why does this work? Because apt::key was added back in 2.1.0 to be compatible with older apt versions. See the warning that it will generate here. Because of this, you can upgrade apt in place safely, provided you are not use the gnarlier parts of the old Apt module. Notably the unattended-upgrades subsection has been moved out into its own module.

I encourage those of you running an infrastructure to follow our lead and upgrade your Apt module. I encourage those of you maintaining and releasing modules to bump your minimum version of Apt to => 2.1. I believe there is a requirement for some velocity in this. If we wait too long, too many new users of Puppet will be caught across a schism of the apt module. That is, unless everyone just runs RedHat anyways.

Monday, August 10, 2015

Just What Is OpenStack Infra?

I work for HP doing two things. By day I work inside the HP firewall setting up and running a CI system for testing HP's OpenStack technology. We call this system Gozer. (By the way we are hiring). By night I work upstream (in the Open Source world) with the OpenStack Infrastructure Team setting up and running a CI system for OpenStack developers.

This blog post concerns my work upstream.

One of my chief initiatives since joining the team two years ago is to make the Puppet codebase used by infra more in-line with standards, more reusable, and generally better. I have never attempted to use infra as a testbed for experimental uses of Puppet, I've always tried to apply the best practices known in the community. Of course there are exceptions to this (see all the Ansible stuff). This initiative is codified in a few different specifications accepted by the team (you don't need to read these):

One mark of the success of this ongoing initiative is that I am now in a place where I am recommending parts of our code to other people in my community. Those are the people for whom I intend this blog post. Someone sees a neat part of the Puppet OpenStack 'stuff' and wants to use it, but it needs a patch or a use case covered. This blog post is supposed to provide a high level overview of what we do, who 'we' are, and the bigger pieces and how they interact with each other. We'll start with a long series of names and definitions.

Naming things is hard

So what is OpenStack? OpenStack is an Open Source software collection based around providing cloud software. The OpenStack Foundation is a nonprofit organization that provides centralized resources to support the effort, this comes in both technical (sysadmins) and other forms (legal, conference organizing, etc). OpenStack is made up of many components, the simplest is that 'nova' provides a compute layer to the cloud i.e. kvm or xen management.

OpenStack can be installed with Puppet. The Puppet code that does this is called "OpenStack Puppet Modules." These modules install OpenStack services such as nova, glance, and cinder. Their source code is available by searching for openstack/puppet-*. The team that develops this code is called the OpenStack Puppet Module Team. This team uploads to the forge under the namespaces 'openstack' or 'stackforge.'

I do not work with these modules on a daily basis.

I work with the OpenStack Infrastructure Team. This team deploys and maintains the CI system used by OpenStack upstream developers. We have our own set of Puppet modules that are completely unrelated to the OpenStack Puppet Modules. Their source code can be found by searching for openstack-infra/puppet-*. These modules are uploaded under the forge namespaces 'openstackci' and 'openstackinfra.' We use these modules to deploy services like Gerrit, Jenkins, and Drupal. We also have a number of utility modules useful for generic Linux administration. We have Precise, Trusty, Centos 6, and various Fedora flavors in our infrastructure, so our modules often have good cross-platform support.

Central Nexus

All the openstack-infra/puppet-* modules are consumed from master by our 'central nexus' repository: system-config. System-config uses a second repository for flat-files: project-config. System-config contains node definitions, public hiera data(soon), a few utility scripts, a modules.env file, a single module to stick 'roles' in called 'openstack_project'. The more 'core' roles in openstack_project call out to another repo called: puppet-openstackci. The secrets are stored in a hiera directory that is not public.

Crude Drawing

The crude drawing above shows a typical flow. A node definition lives in site.pp, which include a role class from openstack_project, which includes a role class from the openstackci module, which then uses resources and classes from the other modules, in this case puppet-iptables.

There are other code paths too. Sometimes, often in fact, an openstack_project role will include openstack_project::server or openstack_project::template, these classes wrap up most of the 'basics' of linux administration. Template or server will go on to include more resources.

There are multiple places to integrate here. At the most basic, a Puppet user could include our puppet-iptables module in their modulepath and start using it. An individual who wants a jenkins server or another server like ours could use openstackci and it's dependencies and write their own openstack_project wrapper classes to include openstackci classes.

We do not encourage site.pp or openstack_project classes to be extended at this time, we instead encourage features or compatibility extensions to be put into openstackci or the service-specific modules themselves. This is a work in progress and some important logic still lives in openstack_project and should be moved out. A stretch-goal is to move to a place where all of openstack infra runs out of openstackci, providing only a hiera yaml file to set parameters.

Continuous Deployment

A note about modules.env: OpenStack-infra has a modules.env file instead of a Puppetfile. This file contains the location, name, and ref of git repositories to put inside the modulepath on the Puppetmaster. OpenStack infra deploys all of its own Puppet modules from master, so any change to any module can break the whole system. We counteract this danger by having lots of testing and code review before any change goes through.

A note about project-config: One of the patterns we use in OpenStack Infra is to push our configuration into flat files as much as possible. We have one repository, project-config, which holds files that control the behaviour of our services, Puppet's job is only to copy files out of the repo and into the correct location. This makes it easier for people to find these often-changed files, and means we can provide more people access to merge code there than we would with our system-config repository.

A note about puppet agent: We run puppet-agent, but it is fired from the Puppetmaster by an ansible run. We hope to move to puppet apply triggered by ansible soon.

The part where I give you things

There are two modules right now that you might be interested in using yourself. The first is our puppet-httpd module. This module was forked from puppetlabs-apache at version 0.0.4. It has seen some minor improvements from us but nothing major, other than a name change from 'apache' to 'httpd'. You can see why we forked in the Readme of the project but the kicker is that this module allows you to use raw 'myhost.vhost.erb' templates with apache. You no longer need to know how to translate the apache syntax you want into puppetlabs-apache parameters. Let's see what this looks like:

# ************************************
# Managed by Puppet
# ************************************

NameVirtualHost <%= @vhost_name %>:<%= @port %>
<VirtualHost <%= @vhost_name %>:<%= @port %>>
  ServerName <%= @srvname %>
<% if @serveraliases.is_a? Array -%>
<% @serveraliases.each do |name| -%><%= " ServerAlias #{name}\n" %><% end -%>
<% elsif @serveraliases != '' -%>
<%= " ServerAlias #{@serveraliases}" %>
<% end -%>
  DocumentRoot <%= @docroot %>

  Alias /bugday /srv/static/bugdaystats
  <Directory /srv/static/bugdaystats>
      AllowOverride None
      Order allow,deny
      allow from all

  Alias /reviews /srv/static/reviewday
  <Directory /srv/static/reviewday>
      AllowOverride None
      Order allow,deny
      allow from all

  Alias /release /srv/static/release

  <Directory <%= @docroot %>>
    Options <%= @options %>
    AllowOverride None
    Order allow,deny
    allow from all

  # Sample elastic-recheck config file, adjust prefixes
  # per your local configuration. Because these are nested
  # we need the more specific one first.
  Alias /elastic-recheck/data /var/lib/elastic-recheck
  <Directory /var/lib/elastic-recheck>
      AllowOverride None
      Order allow,deny
      allow from all

  RedirectMatch permanent ^/rechecks(.*) /elastic-recheck
  Alias /elastic-recheck /usr/local/share/elastic-recheck
  <Directory /usr/local/share/elastic-recheck>
      AllowOverride None
      Order allow,deny
      allow from all

  ErrorLog /var/log/apache2/<%= @name %>_error.log
  LogLevel warn
  CustomLog /var/log/apache2/<%= @name %>_access.log combined
  ServerSignature Off

::httpd::vhost { '':
  port     => 80,
  priority => '50',
  docroot  => '/srv/static/status',
  template => 'openstack_project/status.vhost.erb',
  require  => File['/srv/static/status'],


If you don't need a vhost and just want to serve a directory, you can:

::httpd::vhost { '':
  port     => 80,
  priority => '50',
  docroot  => '/srv/static/tarballs',
  require  => File['/srv/static/tarballs'],


The second is puppet-iptables, which provides the ability to spit direct iptables rules into a Puppet class and have those rules set. You can also specify the ports to open up. Again this is an example of weak modeling. Concat resources around specific rules are coming soon in this change. Let's see what using the iptables module looks like:

class { '::iptables':
  public_tcp_ports => ['80', '443', '8080'],
  public_udp_ports => ['2003'],
  rules4           => ['-m state --state NEW -m tcp -p tcp --dport 8888 -s -j ACCEPT'],
  rules6           => ['-m state --state NEW -m tcp -p tcp --dport 8888 -s -j ACCEPT'],

This enables you to manage iptables the way you view iptables. It is easy to debug, easy to reason about, and extensible. We think it provides a significant advantage over the puppetlabs-firewall module. Unfortunately, the puppet-iptables module currently is hardcoded to open up certain openstack hosts, that should be fixed very soon (possibly by you!). Both of these modules try to be as simple as possible.

Getting these modules right now is done through git. If you don't want to ride the 'master' train with us, you can hop in #openstack-infra on freenode and ask for a tag to be created at the revision you need. We're working on getting forge publishing in to the pipeline, it's not a priority for us right now but if you need it you can ask for it and we can see about increasing focus there.

There are two generic modules that advance the puppet ecosystem coming out of OpenStack Infra and we hope there will be more to come. If you'd like to help us develop these modules we'd love the help. You can start learning how to contribute to OpenStack here.

Saturday, August 1, 2015

Inspecting Puppet Module Metadata

Last week at #puppethack, @hunner helped me land a patch to stdlib to add a load_module_metadata function. This function came out of several Puppet module triage sessions and a patch from @raphink inspired by a conversation with @hirojin.

The load_module_metadata function is available in master of puppetlabs-stdlib, hopefully it will be wrapped up into one of the later 4.x releases, but will almost certainly make it into 5.x.

On it's own this function doesn't do much, but it is composable. Let's see some basic usage:

$: cat metadata.pp

$metadata = load_module_metadata('stdlib')

notify { $metadata['name']: }

$: puppet apply --modulepath=modules metadata.pp
Notice: Compiled catalog for hertz in environment production in 0.03 seconds
Notice: puppetlabs-stdlib
Notice: /Stage[main]/Main/Notify[puppetlabs-stdlib]/message: defined 'message' as 'puppetlabs-stdlib'
Notice: Finished catalog run in 0.03 seconds

As you can see this isn't the most amazing thing ever. However access to that information is very useful in the following case:

$apache_metadata = load_module_metadata('apache')

case $apache_metadata['name'] {
  'puppetlabs-apache': {
    # invoke apache as required by puppetlabs-apache
  'example42-apache': {
    # inovke apache as required by example42-apache
  default: {
    fail("Apache module author not recognized, please add it here")

This is an example of Puppet code that can inspect the libraries loaded in the modulepath, then make intelligent decisions about how to use them. This means that module authors can support multiple versions of 'library' modules and not force their users into one or the other.

This is a real problem in Puppet right now. For every 'core' module there are multiple implementations, with the same name. Apache, nginx, mysql, archive, wget, the list goes on. Part of this is a failure of the community to band behind a single module, but we can't waste time finger pointing now. The cat is out of the bag and we have to deal with it.

We've had metadata.json and dependencies for a while now. However, due to the imperfectness of the puppet module tool, most advanced users do not depend on dependency resolution from metadata.json. At my work we simply clone every module we need from git, users of r10k do much the same.

load_metadata_json enables modules to enforce that their dependencies are being met. Simply put a stanza like this in params.pp:

$unattended_upgrades_metadata = load_module_metadata('unattended_upgrades') 
$healthcheck_metadata = load_module_metadata('healthcheck')

if versioncmp($healthcheck_metadata['version'], '0.0.1') < 0 {
  fail("Puppet-healthcheck is too old to work")
if versioncmp($unattended_upgrades_metadata['version'], '2.0.0') < 0 {
  fail("Puppet-unattended_upgrades is too old to work")

As we already saw, modules can express dependencies on specific implementations and versions. They can also inspect the version available and use that. This is extremely useful when building a module that depends on another module, and that module is crossing a symver major version boundary. In the past, in the OpenStack modules, we passed a parameter called 'mysql_module_version' to each class which allowed that class to use the correct invocation of the mysql module. Now classes anywhere in your puppet code base can inspect the mysql module directly and determine which invocation syntax to use.

$mysql_metadata = load_module_metadata('mysql')

if versioncmp($mysql_metadata['version'], '2.0.0') <= 0 {
  # Use mysql 2.0 syntax
} else {
  # Use mysql 3.0 syntax

Modules can even open up their own metadata.json, and while it is clunky, it is possible to dynamically assert that dependencies are available and in the correct versions.

I'm excited to see what other tricks people can do with this. I'm anticipating it will make collaboration easier, upgrades easier, and make Puppet runs even more safe. If you come up with a neat trick, please share it with the community and ping me on twitter(@nibalizer) or IRC: nibalizer.