Technoblogic(.io)

Shameful Self Promotion

Packer Templates & VMWare

tl;dr

There’s not a lot of docs on Packer and the logging can be tricky to find sometimes depending on the vm you’re booting (which provider backs it). If you’re using VMWare and you’re booting centos the guest_os_type key needs to be set appropriatly.

Some people seem to think this can be $anything that sounds reasonable (I did!) and there is no documentation on what should actually go there (as of writing this, Google was sparce). So if you have a doubt on what the guest_os_type value should be the best bet is to $ diff it with an already existing *.your_vm_type (‘vmx’, ‘ovf’, et cetera).

1
2
cat centos65-puppetmaster.json | grep -i guest_os_type
    "guest_os_type": "centos-65",

However, if I query an already built *.vmx this is incorrect:

1
2
cat master2.vmx | grep -i guestos
   guestos = "redhat"

For CentOS at least, your guest_os_type value should be set to redhat.

Back Story

This week I was working with a customer to automate the deployment of some VM’s to vSphere. This deployment is replaceing some current scripts and manually configured templates. Actually, a lot of scripts and manually configured templates.

The long and the short of it, me and my team decided to implement Vagrant and Packer to push out pre-written Packer templates to vSphere via Vagrant’s vSphere plugin using the VMWare provider.

After scripting the json for this node I ran packer build centos65.json:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
==> vmware-iso: Downloading or copying ISO
==> vmware-iso: Downloading or copying ISO
vmware-iso: Downloading or copying: http://mirrors.kernel.org/centos/6.5/isos/x86_64/CentOS-6.5
==> vmware-iso: Creating virtual machine disk
==> vmware-iso: Building and writing VMX file
==> vmware-iso: Starting HTTP server on port 8582
==> vmware-iso: Starting virtual machine...
==> vmware-iso: Error starting VM: VMware error:
==> vmware-iso: Deleting output directory...
Build 'vmware-iso' errored: Error starting VM: VMware error:
==> Some builds didn't complete successfully and had errors:
--> vmware-iso: Error starting VM: VMware error:

==> Builds finished but no artifacts were created.

The vm booted into fusion and opened but failed in vmrun. How did I know that was the issue?

1
2
3
grep -r vmrun /var/log/*
system.log:May 29 05:30:32 bohr vmrun[13249]: com.vmware.fusion.78704: Invalid argument
... # lots of other crap 

BUT WHAT ARGUMENT???

After digging around I found that logging for this issue was sketchy at best. Syslog, /var/tmp/vmware, the guest_os_type key needs to have an appropriate value (i.e., arugment from syslog):

1
2
cat centos65-puppetmaster.json | grep -i guest_os_type
    "guest_os_type": "centos-65",

However, if I query an already built *.vmx this is incorrect:

1
2
cat master2.vmx | grep -i guestos
   guestos = "redhat"

After changing the k,v in the json template and running packer build the vmrun command ran successfully.

R10k: Control Repos

This document explains the architecture and deployment of a control repository for syncing multiple puppet environments, their associated puppet code (modules) and their assocaited data (hiera). This document does not review how to deploy r10k, however that is discussed in relevant detail here.

What is a control repo?

A control repository stores a Puppetfile and hiera data (hiera.yaml and hieradata/).

What does it contain?

hieradata/
Puppetfile

It is also common to have a hiera.yaml in the control repo. However for many reasons I don’t believe this is a good idea. It can lead to confusing issues with branching of the file that is only consulted once for it’s $datadir path and is not used on a per environment basis - it’s loaded once from $confdir during the hiera() call and is not consulted on a per environment basis (e.g. $confdir/environments/$environments/hiera.yaml is not used during the hiera() lookup, only $confdir/hiera.yaml is used).

How does it work?

The control repo is placed in a monolithic git repository.

The repository can have one or more topic branches that are used by r10k to sync to local Puppet environments.

Configuration for Puppet code & Hiera data sync via r10k

Details on how to deploy a gitlab repo and assocaited topic branching for r10k sync are here.

Place a Puppetfile in $confdir/Puppetfile.

Populate your Puppetfile with what ever crap you need.

In $confdir:

1
2
3
4
5
6
7
git init
git remote add origin git@whatever.com:your_name/control_repo.git
git branch -m master production
git add Puppetfile
git add hiera.yaml # r10k really doesn't need this but we'll add it anyways. 
git add hieradata/
git push -u origin production:production # or whatever branch, maybe production?

Configure hiera.yaml for dynamic environments

NOTE: the hiera.yaml is only loaded from $confdir/hiera.yaml on each puppet run, this means even though you’ll have a hiera.yaml in $confdir/environments/$environment/ those are not actually consulted, only the $confdir hiera.yaml is used - therefore, you can not have hierarchies per environment. Since the $datadir is environment aware that namespace is filled at run time, and consults the specific environment datadir $confdir/environments/$environment/hieradata/.

Your hiera.yaml needs to have a datadir configured for dynamic lookup so:

1
2
3
4
5
6
7
8
9
10
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global

:yaml:
  :datadir: '/etc/puppetlabs/puppet/environments/%{::environment}/hieradata'

Branching your Puppetfile

For example, assuming you already have a master or production branch:

1
vi Puppetfile

…add some git modules etc…

1
2
3
4
git checkout -b development
git push origin development:development
git commit -am Puppetfile
r10k deploy environment -pv

Now you have a new topic branch ‘development’ and a new Puppet environment in $confdir/environments/development.

Branching your hiera data

Our development branch needs it’s own data too:

1
cd $confdir/hieradata
  • modify K,V’s in whatever.yaml
  • modify other K,V’s as needed for your development environment
1
2
3
4
git branch # check your branch, make sure it's still development
git commit -am hieradata
git push 
r10k deploy environment -pv

Check $confdir/environments/development/hieradata

Testing

Configuration files for r10k, hiera, puppet:

/etc/r10k.yaml

1
2
3
4
5
6
7
8
9
root@master hieradata]# cat /etc/r10k.yaml
:cachedir: /var/cache/r10k
:sources:
  puppet:
    remote: "git@10.10.100.111:user/control_repo.git"
    basedir: /etc/puppetlabs/puppet/environments
    prefix: false
:purgedirs:
  - ""

/etc/puppetlabs/puppet/puppet.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master hieradata] cat /etc/puppetlabs/puppet/puppet.conf
[main]
certname = master.puppetlabs.vm
dns_alt_names = master.puppetlabs.vm,puppet
vardir = /var/opt/lib/pe-puppet
logdir = /var/log/pe-puppet
rundir = /var/run/pe-puppet
modulepath = /etc/puppetlabs/puppet/environments/$environment/modules:/opt/puppet/share/puppet/modules
server = master.puppetlabs.vm
user  = pe-puppet
group = pe-puppet
archive_files = true
archive_file_server = master.puppetlabs.vm
# cut [master] & [agent] sections, $modulepath above is the important config key here.

/etc/puppetlabs/puppet/hiera.yaml

1
2
3
4
5
6
7
8
9
10
[root@master puppet] cat hiera.yaml
---
:backends:
  - yaml
:hierarchy:
  - "%{clientcert}"
  - "%{environment}"
  - global
:yaml:
  :datadir: '/etc/puppetlabs/puppet/environmets/%{environment}/hieradata'

/etc/puppetlabs/puppet/Puppetfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@master puppet]# cat Puppetfile
# mod, <module name>, <version or tag>, <source>
forge "http://forge.puppetlabs.com"

# Modules from the Puppet Forge
mod "puppetlabs/stdlib"
mod "puppetlabs/apache", "0.11.0"
mod "puppetlabs/pe_gem"
mod "puppetlabs/mysql"
mod "puppetlabs/firewall"
mod "puppetlabs/vcsrepo"
mod "puppetlabs/git"
mod "puppetlabs/inifile"
mod "zack/r10k"
mod "gentoo/portage"
mod "thias/vsftpd"


# Modules from Github using various references
mod "wordpress",
  :git => "git://github.com/hunner/puppet-wordpress.git",
  :ref => '0.4.0'

Testing the Puppet Master master.puppetlabs.vm:$confdir/:

Our topic branches:

1
2
3
4
[root@master puppet] git branch
  development
* production
  staging

For the given topic branch above, production, let’s look at our hieradata in $confdir/hieradata:

1
2
3
4
[root@master hieradata] pwd
/etc/puppetlabs/puppet/hieradata
[root@master hieradata] ls
agent1.puppetlabs.vm.yaml  agent2.puppetlabs.vm.yaml  agent3.puppetlabs.vm.yaml  master.puppetlabs.vm.yaml

and for each of these files we have the same K,V:

1
2
3
4
5
6
[root@master hieradata] cat master.puppetlabs.vm.yaml
---
message: "%{fqdn} is running in environment %{environment}"
[root@master hieradata] cat agent1.puppetlabs.vm.yaml
---
message: "%{fqdn} is running in environment %{environment}"

Now let’s switch over to our development branch and compare:

1
2
3
4
5
6
7
8
9
10
11
12
[root@master puppet] git checkout development
Switched to branch 'development'
root@master hieradata] pwd
/etc/puppetlabs/puppet/hieradata
[root@master hieradata] ls
agent1.puppetlabs.vm.yaml  agent2.puppetlabs.vm.yaml  agent3.puppetlabs.vm.yaml  master.puppetlabs.vm.yaml
[root@master hieradata] cat agent1.puppetlabs.vm.yaml
---
message: "%{fqdn} is running in environment %{environment}"
[root@master hieradata] cat master.puppetlabs.vm.yaml
---
message: "%{fqdn} is running in environment %{environment}"

Sync everything up with r10k so we can test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@master puppet] r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment staging
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Module::Sync - INFO] Deploying wordpress into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vsftpd into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying portage into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying r10k into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying inifile into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying git into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying vcsrepo into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying firewall into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying mysql into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying pe_gem into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying apache into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Module::Sync - INFO] Deploying stdlib into /etc/puppetlabs/puppet/environments/production/modules
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment master
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment development
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Deployment::PurgeEnvironments - INFO] Purging stale environments from /etc/puppetlabs/puppet/environments

Since there is a %{certname}.yaml for the master we can do a quick check on the command line that we’re accessing the correct data:

1
2
3
4
5
6
7
8
[root@master puppet] git checkout production
Already on 'production'

[root@master puppet] puppet apply -e "notice(hiera(message))"
Notice: Scope(Class[main]): master.puppetlabs.vm is running in environment production
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.06 seconds
Notice: Finished catalog run in 0.24 seconds
[root@master puppet]

The $fqdn and $environment values were correctly filled in. Note that this is a poor test since my data files are essentially all the same, $environment will always match it’s environment and $fqdn will always match it’s fqdn - we could be grabbing this value from anywhere. So Let’s try with hard coded values:

1
2
3
4
5
6
7
8
9
[root@master hieradata] pwd
/etc/puppetlabs/puppet/hieradata
[root@master hieradata] git branch
  development
* production
  staging
[root@master hieradata] cat master.puppetlabs.vm.yaml
---
message: "I'm hard coding this value: environment production, master.puppetlabs.vm.yaml"

Now push your new data for production branch up to gitlab

1
2
3
4
5
6
7
8
9
10
11
12
[root@master puppet]# git add hieradata/
[root@master puppet]# git commit -m "hieradata hard coded"
[production 3a58e49] hieradata hard coded
 1 files changed, 1 insertions(+), 1 deletions(-)
 [root@master puppet]# git push
 Counting objects: 7, done.
 Delta compression using up to 4 threads.
 Compressing objects: 100% (4/4), done.
 Writing objects: 100% (4/4), 399 bytes, done.
 Total 4 (delta 2), reused 0 (delta 0)
 To git@10.10.100.111:user/control_repo.git
    1b240c6..3a58e49  production -> production

and sync r10k

1
2
3
4
5
6
7
[root@master puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment staging
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
# ommitting other output ...

and test for the correct hard coded K,V

1
2
3
4
[root@master puppet]# puppet apply -e "notice(hiera(message))"
Notice: Scope(Class[main]): I'm hard coding this value: environment production, master.puppetlabs.vm.yaml
Notice: Compiled catalog for master.puppetlabs.vm in environment production in 0.05 seconds
Notice: Finished catalog run in 0.26 seconds

YAY!

And again on the development branch

1
2
3
4
5
6
7
8
9
[root@master hieradata] pwd
/etc/puppetlabs/puppet/hieradata
[root@master hieradata] git branch
* development
  production
  staging
[root@master puppet] cat hieradata/master.puppetlabs.vm.yaml
---
message: "I'm hard coding this value: environment development, master.puppetlabs.vm.yaml"

Push our new hieradata for development to gitlab:

1
2
3
4
5
6
7
8
9
10
11
12
[root@master puppet] git add hieradata/
[root@master puppet] git commit -m "hieradata hard coded"
[development 2873db5] hieradata hard coded
 1 files changed, 1 insertions(+), 1 deletions(-)
[root@master puppet] git push
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 398 bytes, done.
Total 4 (delta 2), reused 0 (delta 0)
To git@10.10.100.111:user/control_repo.git
   08e249b..2873db5  development -> development

andthen sync with r10k

1
2
3
4
5
6
7
[root@master puppet]# r10k deploy environment -pv
[R10K::Task::Deployment::DeployEnvironments - INFO] Loading environments from all sources
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment staging
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
[R10K::Task::Environment::Deploy - NOTICE] Deploying environment production
[R10K::Task::Puppetfile::Sync - INFO] Loading modules from Puppetfile into queue
# ommitted the other output...

and test with the --environment development switch

1
2
3
4
[root@master puppet]# puppet apply -e "notice(hiera(message))" --environment development
Notice: Scope(Class[main]): I'm hard coding this value: environment development, master.puppetlabs.vm.yaml
Notice: Compiled catalog for master.puppetlabs.vm in environment development in 0.05 seconds
Notice: Finished catalog run in 0.25 seconds

`

We can also do a quick one time environment run with the --environment flag:

Why I Don't Place a hiera.yaml in My CR

Here’s a really good reason. First off, unless your hiera.yaml is super complicated, then it probably doesn’t need to be in version control. However, for a lot of deployments you need it in VC, but not in your control repo (i.e., the one that r10k will access for Puppetfile, hieradata and build our corrosponding enviros for on your master).

No, place that hiera.yaml in it’s own VC repo.

I had a control repo with:

1
2
3
Puppetfile
hieradata/
hiera.yaml

and I kept running in to this:

1
2
[root@master hieradata]# puppet apply -e "notice(hiera(message))"
Error: Could not find data item message in any Hiera data file and no default supplied at line 1 on node master.puppetlabs.vm

This was a very simple test to see if my production environment was grabbing the correct data via a message K,V in $confdir/environments/production/hieradata/master.puppetlabs.vm.yaml .

Check out the –debug

1
2
3
4
5
6
7
8
Debug: hiera(): Hiera YAML backend starting
Debug: hiera(): Looking up message in YAML backend
Debug: hiera(): Looking for data source master.puppetlabs.vm
Debug: hiera(): Cannot find datafile /etc/puppetlabs/puppet/environmets/production/hieradata/master.puppetlabs.vm.yaml, skipping
Debug: hiera(): Looking for data source production
Debug: hiera(): Cannot find datafile /etc/puppetlabs/puppet/environmets/production/hieradata/production.yaml, skipping
Debug: hiera(): Looking for data source global
Debug: hiera(): Cannot find datafile /etc/puppetlabs/puppet/environmets/production/hieradata/global.yaml, skipping

So I copied the datafile path and redirected it to a diff along with a ```$(pwd) while in the /etc/puppetlabs/puppet/environments/production/hieradata directory:

1
2
3
4
5
[root@master hieradata]# diff <(echo "/etc/puppetlabs/puppet/environmets/production/hieradata/") <(echo "$(pwd)")
1c1
< /etc/puppetlabs/puppet/environmets/production/hieradata/
---
> /etc/puppetlabs/puppet/environments/production/hieradata

So the $datadir path in hiera.yaml

1
2
[root@master puppet]# cat hiera.yaml | grep datadir
  :datadir: '/etc/puppetlabs/puppet/environmets/%{environment}/hieradata'

is indeed missing that f-ing ‘n’. Why? Because this was the second time this happened to me today.

Really? The second time? Really. The second fucking time.

I previously committed hiera.yaml along with hieradata/ and my Puppetfile to branch production. I tested it and came across this problem. Did the same test to figure out the correct path and updated hiera.yaml.

However, hiera.yaml was also in my development and staging branches. I didn’t update those. So here I was again doing tests on development data for Puppet and boom my shit’s broken again.

Since hiera.yaml is only used singularly on a puppet run, i.e., it’s not consulted on a per environment basis, you only need one copy of it, and that’s the one in your puppet $confdir. Having it scattered to the winds with r10k in the control repo does not give you any extra functionality (yet, someday we might make it enviro aware, but currently is not the case).

So if you’re going to run your hiera.yaml into a VCS then do it in it’s own monolithic repo. You can symlink it into the $confdir for Puppet.

Continous Integration Hooks With R10k & Puppet

A week ago I was modifiying a webhook to run r10k on push to a git repository. The goal here was to sync up r10k everytime a push was made to the repo. However, in doing so I found that the current hook didn’t take advantage of deploying a specific puppet environmnet, and instead runs a full r10k sync across all topic branchs and thus all puppet environments.

I figured the first place to start was modifying the ‘post’ method:

1
2
3
4
post '/payload' do
      #protected!
      deploy()
    end

to parse the json sent by git (in this case I was integrating with gitlab) for the ref branch. So the ‘post’ hook now looks like this:

1
2
3
4
5
6
7
8
post '/payload' do
    #protected!
    request.body.rewind  # in case someone already read it
    data = JSON.parse request.body.read
    branch = data['ref'].split("/").last
    "ref branch: #{branch}"
    #deploy(refs)
end

So the #branch effectively is our puppet environment that we want to pass to the r10k mcollective agent so we can deploy a specific puppet enviro and not sync across all topic branches/enviros. This will make it lightweight.

However, I ran into a blocker in the mcollective r10k agent itself. I want to pass this argument to it so I can sync all r10k nodes at once from this hook based on the ref branch the current r10k agent does not accept any arugments and only syncs across all topic branches using the ‘syncronize’ method.

In order to pass this ref branch in and leverage ‘r10k deploy environmnet #{topic_branch}’ as I’m attempting here the agent will need to be modified to parse the argument.

Zach’s current r10k agent is pretty good, so we’ll stick to modifying that (at this point I handed over the agent writing to a colleague Andrew Brader since I was sent to a training site and he had a week/time to modifying the mco agent):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def run_cmd(action,path=nil)
        output = ''
        git  = ['/usr/bin/env', 'git']
        r10k = ['/usr/bin/env', 'r10k']
        case action
        when 'push','pull','status'
          cmd = git
          cmd << 'push'   if action == 'push'
          cmd << 'pull'   if action == 'pull'
          cmd << 'status' if action == 'status'
          reply[:status] = run(cmd, :stderr => :error, :stdout => :output, :chomp => true, :cwd => path )
        when 'cache','environment','module','synchronize','sync', 'deploy_all'
          cmd = r10k
          cmd << 'cache'       if action == 'cache'
          cmd << 'synchronize' if action == 'synchronize' or action == 'sync'
          cmd << 'environment' if action == 'environment'
          cmd << 'module'      if action == 'module'
          cmd << 'deploy' << 'environment' << '-p' if action == 'deploy_all'
          reply[:status] = run(cmd, :stderr => :error, :stdout => :output, :chomp => true)
        end
    end

In order to parse the topic branch from the hook we need to add a method, which Andrew did here:

1
2
3
4
5
6
7
def deploy_only_cmd(r10k_env=nil)
        output = ''
        r10k = ['/usr/bin/env', 'r10k']
        cmd = r10k
        cmd << 'deploy' << 'environment' << r10k_env << '-p'
        reply[:status] = run(deploy_only_cmd, :stderr => :error, :stdout => :output, :chomp => true)
      end

Testing the new hook & agent

… to be updated shortly…

Brief Intro to Puppetizing STIG Compliance

STIG is a methodology for the implementation of security compliance across heterogeneous operating systems. Every OS has a specific STIG. The STIG for a given OS is maintained and distributed by the Defense Information Systems Agency (DISA). Current STIGs can be found on the DISA website.

Puppet is an excellent tool for bringing machines into STIG compliance. At my previous job I was in shell script hell until we spun up a POS master and started writing a STIG module that worked across all OS’s we supported (14 linux distros, 30 territorial sys admins, ~250 nodes).

In doing a STIG module for puppet it’s best to just work from Cat 1 down. 1 & 2 offenses are pretty big, 3 & 4 are usually minor security patches and can be overlooked if you’re time crunched. Staying on top of the false positives and keeping your module up to date with actual security implementation is the biggest hurdle.

This document is a general guideline from my experience in implementing STIG compliance measures.

Basic STIG Process

  1. Go to the DISA site and download the STIG for the appropriate OS

  2. Do some sort of benchmark scan on a base OS:

    • I used retina (which is awful) scans that were preloaded with the appropriate Security Content Automation Protocol (SCAP) documents (.xml files, see below).
    • You can also get SCAP benchmarks on the DISA site for the appropriate OS listed with the STIG benchmarks (see below on SCAP)
  3. Take retina report with categorized vulnerabilities and start writing puppet modules - oh wait, just kidding! first you’ll need to:

    1. Check for false positives - in most cases the OS provider is way ahead of the game, and simply running ‘apt-get update’ or ‘yum update’ will take care of 80% of the vulnerabilities on the box. STIG guidelines are notoriously behind the OS and will have LOTS of false positives. Most of your time will be spent accounting for what is actually a vulnerability and what is actually already patched by the OS provider.
  4. What vulnerabilities are left after cross correlating with security patches are what you have to develop specified puppet classes for, this is actually the easy part.

A really good resource for this is the Aqueduct Project at Fedora. They’re really good about staying on top of the recent STIG process and also maintaining a set puppet module for STIGing RHEL boxes (or at least they used to, not sure what their status is now. A recent look shows they’re only on RHEL 5 so…).

SCAP

The Security Content Automation Protocol (SCAP) is a protocol developed by the NSA puppet branch of government known as NIST for implementing IT security measures. Some parts of the protocol can be useful in scanning tools such as retina since the XML format for the vulnerability index are standardized.

For example, when you are on the DISA site looking at the current STIG for a specific OS there will be a download button for ‘SCAP Benchmarks’. This is a .zip of several .xml’s that contain:

Common Platform Enumeration (CPE) files for describing and identifying classes of applications, operating systems, and hardware devices present among an enterprise’s computing assets.

Open Vulnerability and Assessment Language (OVAL) for assessing and reporting upon the rachine state.

Extensible Configuration Checklist Description Format (XCCDF) which is a structured collection of security configuration rules for some set of target systems.

… and maybe some other random ones as well.

This is a document in motion so I’ll be adding more here, feel free to add as you find information as well.

How to Install Rspec-Puppet on Puppet Enterprise

If you’re getting into rspec testing for your manifests you might already know this: Puppet Enterprise has it’s own gem environment. This is a quick post on how to install rspec-puppet to your PE gem environment.

If you’re new to rspec-puppet check out this great site for a brief on installing (except if you’re on PE, then follow the install below). rspec-puppet.com also has a great tutorial to get your up and running.

Installing rspec-puppet on Puppet Enterprise

If you’ve already installed rspec-puppet via system gem you will get this error on rspec-puppet-init:

[root@master users]# rspec-puppet-init 
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- puppet (LoadError)

If you’ve already installed rspec-puppet vis system gem:

$ /usr/bin/gem uninstall rspec-puppet

Once system rspec-puppet is removed:

$ /opt/puppet/bin/gem install rspec-puppet

Then to init a new puppet rspec enviro in your $moduledir:

$ /opt/puppet/bin/rspec-puppet-init

This should build:

+ spec/
+ spec/classes/
+ spec/defines/
+ spec/functions/
+ spec/hosts/
+ spec/fixtures/
+ spec/fixtures/manifests/
+ spec/fixtures/modules/
+ spec/fixtures/modules/users/
+ spec/fixtures/manifests/site.pp
+ spec/fixtures/modules/users/manifests
+ spec/fixtures/modules/users/lib
+ spec/spec_helper.rb
+ Rakefile

Installing PE-Specific Gems using PE-Gem Provider

puppetlabs/pe_gem has the provider for pe_gem so you can simply:

1
2
3
4
package { 'json':
  ensure   => present,
    provider => pe_gem,
  }

That’s it!

TLS/SSL DH Cipher Padding Bug in ActiveMQ

Update

As of March 18 it appears that Oracle has implemented a fix in their release of JDK and Java Standard Edition version 8 and it’s assocaited security extensions:

“Support stronger ephemeral DH keys in the SunJSSE provider: Make ephemeral DH key match the length of the certificate key during SSL/TLS handshaking in the SunJSSE provider. A new system property, jdk.tls.ephemeralDHKeySize, is defined to customize the ephemeral DH key sizes. The minimum acceptable DH key size is 1024 bits, except for exportable cipher suites or legacy mode (jdk.tls.ephemeralDHKeySize=legacy). See Customizing Size of Ephemeral DH Keys and RFE 6956398.”


In helping a client with an ActiveMQ issue in Puppet Enterprise I recently stumbled across this line in their wrapper log:

INFO | jvm 1 | 2014/02/26 12:47:20 | WARN | Transport Connection to: tcp://ip.removed:49867 failed: javax.net.ssl.SSLHandshake
Exception: Invalid Padding length: 239

The client thought this may have been exacberating a JVM memory problem, however I found it actually is a not related but in and of itself it’s own bug in the Java Security Extensions for Diffe-Helman cipher implementation over SSL.

We have seen similar issues in JDK 1.7x security extensions in other Java-powered backends for Puppet Enterprise such as PuppetDB. It has also been documented on the Apache ActiveMQ ticket board and the Oracle community board.

The issue is dependant on:

* OpenJDK Runtime Environment (PE Java 1.7.0.19) 
* Linux OS's (so far my testing is on CentOS 6x) 
* TLS_DHE_RSA_WITH_AES_128_CBC_SHA Cipher
* openssl-1.0.0-27.el6_4.2.x86_64

To sum up the problem, every few hundred messages that are encrypted over SSL or TLS using DH ciphers the client gets a handshake exception. The exception is caused by a faulty SSL packet.

Oracle’s Solution

A ticket was submitted to Java bugs and was set “resolved” on 2013-10-25. However, and this is a big however, their resolution is, “In order to have reliable TLS handshakes, Diffie Hellman key exchanges must be disabled.”

I personally don’t like this resolution since DH keys are sometimes neccessary, and in terms of security is superior to standard RSA ciphers. DH ciphers provide perfect forward secrecy. That means even if the private key is compromised you can not decrypt past data. Ciper suites which use DHE-RSA-AES128-SHA all implement the slower, more secure ephemeral DH crypto - it’s ephemeral since new random numbers that generate the key are used each time. This is also why it’s slower. However, it’s also harder to run a selected clear text attack on EDH since the private key is used for only authentication and use an independant method to agree on a shared secret - standard RSA ciphers employ the private key for both auth and encryption for better performance in exchange for not providing perfect forward secrecy.

You may now chime in with your own conspiracy theories as to why Oracle would settle for solving this cyrpto issue by simply using a less secure cipher - does the NSA not want Java applications, which function as the backbone to a great deal of web traffic, encrypting data with perfect forward secrecy?

Workaround in AMQ

Since there is no good way to get around this problem in JDK 1.7x security extensions you have two choices:

1. Live with the error in approx. 5% of the SSL traffic
2. Run SSL with a non-DH or DHE cipher  

Door #1

Example, you’ve got an AMQ broker administering messaging for 1000+ AMQ agents in a live management setup in Puppet Enterprise you’ll see this error a lot, and it may (warning, assumption) degrade live management performance in such a large deployment.

If you can live with either seeing this error 5% of the time or you can live with the hit in performance sticking it out with DH can still work.

Door #2

You need the performance or are a stickler for perfect SSL key exchange.

A possible solution would be modifying the transportConnector in /etc/puppetlabs/activemq/activemq.xml:

<transportConnector name="openwire" uri="ssl://0.0.0.0:61616"/>
<!-- Puppet mcollective_enable_stomp_ssl=true
<transportConnector name="stomp+ssl" uri="stomp+ssl://0.0.0.0:61613"/>\

With the transport.enabledCipherSuites embedded:

ssl://localhost:61616?transport.enabledCipherSuites=SSL_RSA_WITH_3DES_EDE_CBC_SHA

The SSL_RSA_WITH_3DES_EDE_CBC_SHA is non-DH cipher versus the SSL_DH_anon_WITH_3DES_EDE_CBC_SHA which I think is what AMQ currently uses.

Note syntax: ssl://…?socket.enabledCipherSuites=THE_CIPHER # for agents

and

ssl://…?transport.enabledCipherSuites=THE_CIPHER # for brokers 

More information about this can be found on the AMQ reference page

Debugging ActiveMQ JVM Heap Memory Errors

This just happened:

1
INFO   | jvm 1    | 2014/03/11 16:12:34 | Exception in thread "ActiveMQ BrokerService[ppm.prod.dc2.adpghs.com] Task-79" Exception in thread "ActiveMQ BrokerService[ppm.prod.dc2.adpghs.com] Task-101" Exception in thread "ActiveMQ BrokerService[ppm.prod.dc2.adpghs.com] Task-87" Exception in thread "ActiveMQ BrokerService[ppm.prod.dc2.adpghs.com] Task-30" Exception in thread "ActiveMQ BrokerService[ppm.prod.dc2.adpghs.com] Task-74" java.lang.OutOfMemoryError: unable to create new native thread

Since this is on a production server you need to recreate it in a testing environment. Since I’m partial to vagrant I stand up 4 agent nodes and a master via pe-build vagrant plugin. My Vagrantfile looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "centos-64-x64-nocm"
  config.vm.box_url = "http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-fusion503-nocm.box"

  config.pe_build.version       = '3.1.0'
  config.pe_build.download_root = 'https://s3.amazonaws.com/pe-builds/released'

## Master
  config.vm.define :master do |master|

    master.vm.provider :vmware_fusion do |v|
      v.vmx["memsize"]  = "4096"
      v.vmx["numvcpus"] = "4"
    end

    master.vm.network :private_network, ip: "10.10.100.100"

    master.vm.hostname = 'master.puppetlabs.vm'
    master.vm.provision :hosts

    master.vm.provision :pe_bootstrap do |pe|
      pe.role = :master
    end

    config.vm.provision "shell",
      inline: "service iptables stop"
  end

## agent 1
  config.vm.define :agent1 do |agent|

    agent.vm.provider :vmware_fusion
    agent.vm.network :private_network, ip: "10.10.100.111"

    agent.vm.hostname = 'agent1.puppetlabs.vm'
    agent.vm.provision :hosts

    agent.vm.provision :pe_bootstrap do |pe|
      pe.role   =  :agent
      pe.master = 'master.puppetlabs.vm'
    end
  end

## agent 2
  config.vm.define :agent2 do |agent|

    agent.vm.provider :vmware_fusion
    agent.vm.network :private_network, ip: "10.10.100.112"

    agent.vm.hostname = 'agent2.puppetlabs.vm'
    agent.vm.provision :hosts

    agent.vm.provision :pe_bootstrap do |pe|
      pe.role   =  :agent
      pe.master = 'master.puppetlabs.vm'
    end
  end

## agent 3
  config.vm.define :agent3 do |agent|

    agent.vm.provider :vmware_fusion
    agent.vm.network :private_network, ip: "10.10.100.113"

    agent.vm.hostname = 'agent3.puppetlabs.vm'
    agent.vm.provision :hosts

    agent.vm.provision :pe_bootstrap do |pe|
      pe.role   =  :agent
      pe.master = 'master.puppetlabs.vm'
    end
  end

## agent 4
   config.vm.define :agent4 do |agent|

    agent.vm.provider :vmware_fusion do |v|
      v.vmx["memsize"]  = "1024"
      v.vmx["numvcpus"] = "2"
    end

    agent.vm.network :private_network, ip: "10.10.100.114"

    agent.vm.hostname = 'agent4.puppetlabs.vm'
    agent.vm.provision :hosts

    agent.vm.provision :pe_bootstrap do |pe|
      pe.role = :agent
      pe.master = 'master.puppetlabs.vm'
    end
  end
end

This error occured on an ActiveMQ install that works as the message que for a 1000 node deployment of puppet agents. To get terminology straight here, we have puppet agents and AMQ agents running on these 1000 nodes. They’re all qued from a singular AMQ broker.

My first impression was that this error may be caused by having 1000 agents pinging a single AMQ broker, which is limited to 800 via fds instances.

I check locally on my test master:

1
2
3
4
[root@master vagrant]# pgrep -f pe-activemq
1271
[root@master vagrant]# cat /proc/1271/limits | grep files
Max open files            1024                 4096                 files

My soft limit is 1024 open files, and after rando .jars and logs and stuff that really feels like more around 800. So this is on the money, as far as what the docs say for ActiveMQ servers per broker.

How am I going to recreate what a 1000 node enviro looks like?

I’m limited to my laptop, 16GB of memory, and I’m too lazy to stand up 1000 instances in AWS (and to poor). So an attempt has to be made to recreate this memory error on my 5 nodes running locally.

Given the above information, I start open a shell, and ssh into my master:

1
2
3
[root@master vagrant]# echo -n "Max open files=3:3" > /proc/1271/limits
[root@master vagrant]# cat /proc/1271/limits | grep files
Max open files            3                    3                    files

Why 3? Because:

1
2
[root@master vagrant]# ls /proc/1271/fd | wc -l
6

So a quick ‘service pe-activemq restart’ and bang…

Oh shit, new PID, new proc instance. Damnit. I have to figure out something else to fake the resource limits here.

Since ulimit commands are shell-bound I can ssh into the master from another shell and run:

1
2
3
4
5
[root@master vagrant]# ulimit -n 10
[root@master vagrant]# service pe-activemq restart
/etc/init.d/functions: redirection error: cannot duplicate fd: Invalid argument
Stopping pe-activemq:                                      [  OK  ]
Starting pe-activemq:                                      [  OK  ]

The trick here is getting close enough to the lowest possible resource limits with out getting the

1
2
bash: start_pipeline: pgrp pipe: Too many open files
/bin/sh: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: Error 24

error.

3 was actually too low, so I ran with 10 and was able to run a restart. You have to account for other files that may be bound to the PID instance, like logs and .jars since this is a bunch of java. And as everyone knows, Java is basically the pig of programming languages.

So 10 fds worked and activemq has restarted. Let’s look at my logs to see what I’ve got:

1
2
INFO   | jvm 5    | 2014/03/13 04:46:13 | Error: Exception thrown by the agent : java.rmi.server.ExportException: Listen failed on port: 0; nested exception is:
INFO   | jvm 5    | 2014/03/13 04:46:13 |     java.net.SocketException: Too many open files

That isn’t what I wanted to see. I am looking for heap memory errors. So this clearly demonstrates it’s not an fds constraint at the filesystem level. Time to move on to other possibilities.

Possible Culprits

  1. The JVM

    • Consider increasing the total heap memory available to the broker JVM
    • Consider reducing the default JVM stack size of each thread using -xss
    • If your broker is embedded ensure the hostiong JVM has appropriate heap and stack sizes.
  2. The broker

Solutions

Check your log for current JVM heap size:

INFO   | jvm 1    | 2014/02/26 12:47:04 |   Heap sizes: current=506816k  free=487246k  max=506816k

Try bumping this up to 1GB in

 /etc/puppetlabs/activemq/wrapper.conf

If you still get

INFO   | jvm 1    | 2014/02/26 12:47:38 | Exception in thread "ActiveMQBrokerService[ppm.prod.dc2.adpghs.com] Task-58" java.lang.OutOfMemoryError: unable to create new native thread 

in your

/var/log/pe-activemq/wrapper.log

then throttle up your systemUsage in

/etc/puppetlabs/activemq/activemq.xml

per this guideline

Hard Limits of AMQ

If you still get OOM errors you may be at a hard limit for agents per broker. ActiveMQ uses the amqPersistenceAdapter by default for persistent messages. Unfortunately, this persistence adapter (as well as the kahaPersistenceAdapter) opens a file descriptor for each queue. When creating large numbers of queues, you’ll quickly run into the limit for your OS.

However, your logs will not register a OOM error as above, they’ll show

ERROR  | wrapper  | 2014/03/13 03:32:39 | JVM exited while loading the application.
INFO   | jvm 4    | 2014/03/13 03:32:39 | Error: Exception thrown by the agent : java.rmi.server.ExportException: Listen failed on port: 0; nested exception is:
INFO   | jvm 4    | 2014/03/13 03:32:39 | java.net.SocketException: Too many open files

If that is your error your could try upping the limit on file descripters per process. You can do something similar to what I did above or Google for your OS.

At this point if none of the above resolved the issues you should try standing up a second broker, especially if you’re running more than 1000 agents on a single broker instance.

You can read more about standing up networks of brokers and also AMQ performance.