Add blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md

This commit is contained in:
Danny Berger
2014-09-17 15:36:11 -06:00
parent 6875fcd654
commit bd15ab54be

View File

@@ -0,0 +1,737 @@
---
title: "Simplifying My BOSH-related Workflows"
layout: "post"
tags: [ "aws", "bosh", "cloudformation", "cloudfoundry", "cloque", "docker", "ec2", "packaging", "snapshots", "twig" ]
description: "Discussing some commands and wrappers I've been adding on top of BOSH."
---
Over the last nine months I've been getting into [BOSH][1] quite a bit. Historically, I've been [reluctant][20] to
invest in BOSH because I don't entirely agree with its architecture and steep learning curve. BOSH
[describes itself][1] with...
> BOSH installs and updates software packages on large numbers of VMs over many IaaS providers with the absolute
> minimum of configuration changes.
>
> BOSH orchestrates initial deployments and ongoing updates that are:
>
> * Predictable, repeatable, and reliable
> * Self-healing
> * Infrastructure-agnostic
With continued use and experience necessitated from the [logsearch][2] project, I saw ways it would solve more critical
problems for me than it would create. For that reason, I started experimenting and migrating some services over to
BOSH to better evaluate it for my own uses. To help bridge the gap between BOSH inconveniences and some of my
architectural/practical differences I've been making a tool called [`cloque`][3].
You might find the ideas more useful rather than the `cloque` code itself - it is, after all, experimental and written
in PHP (since that's why I'm most productive in) whereas `bosh` is more Ruby/Go-oriented.
## Infrastructure First
Generally speaking, BOSH needs some help with infrastructure (i.e. it can't create its own VPC, network routing tables,
etc). Additionally, sometimes deployments don't even need the BOSH overhead. Within `cloque`, I've split management
tasks into two components:
* Infrastructure - this is more of the "physical" layer defining the networking layer, some independent services (e.g.
NAT gateways, VPN servers), security groups, and other core or non-BOSH functionality.
* BOSH - everything related to BOSH (e.g. director, deployment, snapshots, releases, stemcells) which is deployed onto
the infrastructure somewhere.
Since BOSH depends on some infrastructure, we'll get started with that first. One key to a `cloque`-managed environment
is that each environment has its own directory which includes a `network.yml` in the top-level. The network may be
located in a single datacenter, or it could span multiple countries. The file defines all the basics about the network
including subnets, reserved IPs, basic cloud properties, and some logical names.
I've committed an example network to the [`share`][7] directory within `cloque` and will use that in the examples here.
To get started, we'll copy the example and work with it...
# copy the sample environment
$ cp -r ~/cloque/share/example-multi ~/cloque-acme-dev
$ cd ~/cloque-acme-dev
# this will help the command know where to look for configs later
$ export CLOQUE_BASEDIR="$PWD"
If you take a look at the sample [`network.yml`][18], you'll see a couple regions with their individual network
segments, VPN networks, and a few reserved IP addresses which can be referenced elsewhere. Once `network.yml` is
created, the `utility:initialize-network` task can take care of bootstrapping the following:
* create stub folders for your different regions; e.g. `aws-apne1/core`, `global/private`)
* create a new SSH key (in `global/private/cloque-{yyyymmdd}*.pem`) and upload it to the AWS regions being used
* create a new IAM user, access key, and EC2 policy for BOSH to use
* create a certificate authority for [OpenVPN][8] usage
* create both client/server certificates for the inter-region VPN connections (requires interactive prompts for
passwords/confirmations)
* create an S3 bucket for shared configuration storage
When run, it assumes AWS credentials can be discovered from the environment...
$ cloque utility:initialize-network
> local:fs/global -> created
...snip...
> I created `utility:initiailize-network` because I found myself reusing keys and buckets across multiple environments
> (such as development vs production) because they were annoying to manage by hand. I wanted to make security easier
> for myself and, in the process, simplify the processes through automation.
The top-level `global` directory is intended for configuration which applies to all areas. With the example I use it to
create an additional IAM role which allows VPN gateways to securely download their VPN keys and configuration files...
$ ( cd global/core && cloque infra:put --aws-cloudformation 'Capabilities=["CAPABILITY_IAM"]' )
> validating...done
> checking...missing
> deploying...done
> waiting...CREATE_IN_PROGRESS...........................CREATE_COMPLETE...done
The `infra:put` is the core command responsible for managing the low-level, infrastructure-related resources. The
command looks for an `infrastructure.json` file (see the [example][27]) and since I'm focused on [AWS][4], the files
are [CloudFormation][5] scripts.
> One thing I dislike about BOSH is how it uses a state file or global options to specify the director/deployment. It
> makes it very inconvenient to quickly switch between directors/deployments even between multiple terminal sessions.
> To help with that, `cloque` respects environment variables (or command line options) to know where it should be
> working from. The `CLOQUE_BASEDIR` (exported earlier) is the most significant, and it was able to detect when it was
> working from the `global` region/director and `core` deployment based on the current directory.
Now that the global resources have been created, we can create our "core" resources for the `us-west-2` region. If you
take a look at the [infrastructure.json][28] file, you'll see it creates a VPC, multiple subnets for each availability
zone, a couple base security groups, and a gateway instance which will function as a VPN server to allow inter-region
communication. You'll also notice it's using [Twig][10] templating to load `network.yml` and simplify what would be a
lot of repeated resources. We'll use the `infra:put` command again, but this time within the `aws-usw2/core`
directory...
$ cd aws-usw2
$ ( cd core && cloque infra:put )
...snip...
> waiting...CREATE_IN_PROGRESS.........................CREATE_COMPLETE...done
> BOSH supports ERB-templated deploy manifests. With ERB I found myself repeating a lot of code in each manifest when
> trying to make it dynamic. After trying [spiff][21] (which I found a bit limited and difficult to understand), I
> decided to use a different approach - one that would allow for the same dynamic, peer-config referencing, and
> (later) transformational capabilities for both infrastructure configuration and BOSH deployment manifests.
Once the `infra:put` command finishes, the `aws-usw2` part of the environment is complete which means the OpenVPN
server is ready for a client. First we'll need to create and sign a client certificate though...
# temporary directory
$ mkdir tmp-myovpn
$ cd tmp-myovpn
# create a key (named after the hostname and current date)
$ TMPOVPN_CN=$(hostname -s)-$(date +%Y%m%da)
$ openssl req \
-subj "/C=US/ST=CO/L=Denver/O=ACME Inc/OU=client/CN=${TMPOVPN_CN}/emailAddress=`git config user.email`" \
-days 3650 -nodes \
-new -out openvpn.csr \
-newkey rsa:2048 -keyout openvpn.key
Generating a 2048 bit RSA private key
.............................+++
................+++
writing new private key to 'openvpn.key'
-----
# sign the certificate (you'll need to enter the PKI password you used in the first step)
$ cloque openvpn:sign-certificate openvpn.csr
# now create the OpenVPN configuration profile for connecting to aws-usw2
$ ( \
cloque openvpn:generate-profile aws-usw2 $TMPOVPN_CN \
; echo '<key>' \
; cat openvpn.key \
; echo '</key>' \
) > acme-dev-aws-usw2.ovpn
# opening should install it with a GUI connection manager like Tunnelblick
$ open acme-dev-aws-usw2.ovpn
# cleanup
$ cd ../
$ rm -fr tmp-myovpn
$ unset TMPOVPN_CN
> I created the `openvpn:sign-certificate` and, namely, `openvpn:generate-profile` commands to make the steps highly
> reproducible to encourage better certificate usage practices through it's "trivialness".
Since I'm using `example.com` in the `share` scripts as the domain, DNS won't resolve it. For now, the easiest solution
is to manually add an entry to `/etc/hosts`...
$ echo "`cd core && cloque infra:get '.Z0GatewayEipId'` gateway.aws-usw2.acme-dev.cloque.example.com" \
| sudo tee -a /etc/hosts
> The `infra:get` command allows me to programmatically fetch configuration details about the current deployment. For
> infrastructure, this allows me to extract the created resource IDs/names using [jq][12] statements. This makes it
> extremely easy to automate basic lookup tasks (as in this case), but also allows for more complex IP or security
> group enumeration which can be used for other composable, automated tasks.
Once `/etc/hosts` is updated, I can connect with an OpenVPN client like [Tunnelblick][13] and ping the network...
$ ping -c 5 10.101.0.4
PING 10.101.0.4 (10.101.0.4): 56 data bytes
64 bytes from 10.101.0.4: icmp_seq=0 ttl=64 time=59.035 ms
64 bytes from 10.101.0.4: icmp_seq=1 ttl=64 time=61.288 ms
64 bytes from 10.101.0.4: icmp_seq=2 ttl=64 time=78.194 ms
64 bytes from 10.101.0.4: icmp_seq=3 ttl=64 time=57.850 ms
64 bytes from 10.101.0.4: icmp_seq=4 ttl=64 time=57.956 ms
--- 10.101.0.4 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 57.850/62.865/78.194/7.764 ms
## BOSH Director
Now that we have a VPC and a private network to deploy things into, we can start a BOSH Director. Here it's important
to note that I'm using "region", "network segment", and "director" interchangeably. Typically you'll have a single BOSH
Director within an environment's region, and since that Director will tag it's deployment resources with a "director"
tag, I decided to make them all synonyms. The effect is twofold:
* when you see a "director" name (whether it's in the context of BOSH or not) it refers to where resources are
provisioned
* you can consistently use a "director" tag (BOSH or not) to identify where something is deployed which makes AWS
resource management much simpler (and AWS Billing reports by tag much more valuable).
Back to getting BOSH deployed though. First, we'll create some additional BOSH-specific, region-specific infrastructure
(specifically, security groups for the director and agents)...
$ ( cd bosh && cloque infra:put )
...snip...
> waiting...CREATE_IN_PROGRESS...............CREATE_COMPLETE...done
> Here I start using the `bosh` directory. I put Director-related configuration in the `bosh` deployment. Individual
> BOSH deployments get their own directory.
Once the security groups are available, we can create the BOSH Director. The `boshdirector:*` commands deal with the
Director tasks (i.e. they don't depend on a specific deployment). To get started, the `boshdirector:inception:start`
command takes care of provisioning the inception instance (it takes a few minutes to get everything installed and
configured)...
$ cloque boshdirector:inception:start \
--security-group $( cloque --deployment=core infra:get '.TrustedPeerSecurityGroupId' ) \
--security-group $( cloque --deployment=core infra:get '.PublicGlobalEgressSecurityGroupId' ) \
$(cloque infra:get '.SubnetZ0PublicId') \
t2.micro
> finding instance...missing
> instance-id -> i-f84169f3
> tagging director -> acme-dev-aws-usw2
> tagging deployment -> cloque/inception
> tagging Name -> main
> waiting for instance...pending.........running...done
> waiting for ssh.......done
> installing...
...snip...
> uploading compiled/self...
...snip...
> uploading global/private...
...snip...
> You'll notice the `cloque --deployment=core infra:get` usage to to load the security groups. The `--deployment`
> option is an alternative to running `cd ../core` before the command. Another alternative would be to use the
> `CLOQUE_DEPLOYMENT` environment variable. Whatever the case, `cloque` is intelligent and flexible about figuring out
> where it should be working from.
Before continuing, there's still a manual process of finding the correct stemcell. If we were in `us-east-1`, we could
use the "light-bosh" stemcell (which is really just an alias to a pre-compiled AMI that Cloud Foundry publishes).
Unfortunately, we need to take the slower route of compiling our own AMI for `us-west-2`. To do this, we need to lookup
the latest stemcell URL from the [published artifacts][15], then we pass that URL to the next command...
$ cloque boshdirector:inception:provision \
https://s3.amazonaws.com/bosh-jenkins-artifacts/bosh-stemcell/aws/bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz
> finding instance...found
> instance-id -> i-f84169f3
> deploying...
WARNING! Your target has been changed to `https://10.101.16.8:25555'!
Deployment set to '/home/ubuntu/cloque/self/bosh/bosh.yml'
Verifying stemcell...
File exists and readable OK
Verifying tarball...
Read tarball OK
Manifest exists OK
Stemcell image file OK
Stemcell properties OK
Stemcell info
-------------
Name: bosh-aws-xen-ubuntu-trusty-go_agent
Version: 2710
Started deploy micro bosh
Started deploy micro bosh > Unpacking stemcell. Done (00:00:18)
Started deploy micro bosh > Uploading stemcell. Done (00:05:16)
Started deploy micro bosh > Creating VM from ami-8fe7a1bf. Done (00:00:19)
Started deploy micro bosh > Waiting for the agent. Done (00:01:19)
Started deploy micro bosh > Updating persistent disk
Started deploy micro bosh > Create disk. Done (00:00:02)
Started deploy micro bosh > Mount disk. Done (00:00:09)
Done deploy micro bosh > Updating persistent disk (00:00:19)
Started deploy micro bosh > Stopping agent services. Done (00:00:01)
Started deploy micro bosh > Applying micro BOSH spec. Done (00:00:21)
Started deploy micro bosh > Starting agent services. Done (00:00:01)
Started deploy micro bosh > Waiting for the director. Done (00:00:19)
Done deploy micro bosh (00:08:13)
Deployed `bosh/bosh.yml' to `https://10.101.16.8:25555', took 00:08:13 to complete
> fetching bosh-deployments.yml...
receiving file list ...
1 file to consider
bosh-deployments.yml
1025 100% 1000.98kB/s 0:00:00 (xfer#1, to-check=0/1)
sent 38 bytes received 723 bytes 101.47 bytes/sec
total size is 1025 speedup is 1.35
> tagging...done
> The `:start` command took care of pushing the compiled manifest, but this `:provision` command is responsible for
> pushing everything to the director and, once complete, downloading the resulting configuration locally. I created
> these two commands because they were a common task and the manual, iterative process was getting tiresome. It also
> helps unify both the intitial provisioning vs upgrade process *and* deploying from AMI vs TGZ. Instead of ~12 manual
> steps spread out over ~30 minutes, I only need to intervene at three points (including instance termination).
Once the provisioning step is complete, I can login and talk to BOSH...
# default username/password is admin/admin
$ bosh target https://10.101.16.8:25555
$ bosh status
Config
/Users/dpb587/cloque-acme-dev/aws-usw2/.bosh_config
Director
Name acme-dev-aws-usw2
URL https://10.101.16.8:25555
Version 1.2710.0 (00000000)
User admin
UUID f38d685c-9a72-4fc0-bc84-558979cc80bf
CPI aws
dns enabled (domain_name: microbosh)
compiled_package_cache disabled
snapshots disabled
Deployment
not set
Since BOSH Director is successfully running, it's safe to terminate the inception instance. Whenever there's a new BOSH
version I want to deploy, I can just rerun the two `start` and `provision` commands (with an updated stemcell URL)
and it will take care of upgrading it.
### More on Stemcells
While inception was deploying the BOSH Director, it ended up making a stemcell that I can reuse for our BOSH
deployments. Unfortunately, the Director doesn't know about it. The following command takes care of publishing it...
$ cloque boshutil:create-bosh-lite-stemcell-from-ami \
https://s3.amazonaws.com/bosh-jenkins-artifacts/bosh-stemcell/aws/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz \
ami-8fe7a1bf
Uploaded Stemcell: https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz
> The command uses the URL (the light-bosh stemcell of the same version from the [artifacts][15] page) as a template
> and patches in the correct metadata for the local region. It then takes care of uploading it to the environment's S3
> bucket and to the Director so it's immediately usable.
Another task I frequently need to do is convert the standard stemcells (which only support the PV virtualization) into
HVM stemcells that I can use with AWS's newer instance types. This next command takes care of all those steps
and, once complete, there will be a new `*-hvm` stemcell ready for use on the Director.
$ cloque boshutil:convert-pv-stemcell-to-hvm \
https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz \
ami-d13845e1 \
$( cloque --deployment=core infra:get '.SubnetZ0PrivateId , .TrustedPeerSecurityGroupId' )
Created AMI: ami-f3e3a5c3
Uploaded Stemcell: https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent-hvm.tgz
> The command needs the light-bosh TGZ and AMI for the existing PV stemcell as well as a subnet and security group for
> it to provision the conversion instances in.
## BOSH Deployment
Now that the BOSH Director is running, I can deploy something interesting onto it. Let's use [logearch][2] as an
example. First I'll need to clone the repository...
$ git clone https://github.com/logsearch/logsearch-boshrelease.git ~/logsearch-boshrelease
$ cd ~/logsearch-boshrelease
Since I've changed directories away from our environment, `cloque` will no longer know where to find its environment
information. To help, I'll use a `.env` file...
$ ( \
echo 'export CLOQUE_BASEDIR=~/cloque-acme-dev' \
; echo 'export CLOQUE_DIRECTOR=aws-usw2' \
; echo 'export CLOQUE_DEPLOYMENT=logsearch' \
) > .env
> I mentioned before that `cloque` uses the current working directory, environment variables, and command options to
> figure out where to look for things. If it's still missing information, it will check and load a `.env` file from
> the current directory as a last resort. This is normally only useful during development where I already use `.env`
> for other project-specific BASH `alias`es and variables.
Now I can upload the release...
$ cloque boshdirector:releases:put releases/logsearch-latest.yml
> Since releases are Director-specific and unrelated to a particular deployment, It uses the `boshdirector:*`
> namespace.
The example has the configuration files for infrastructure (EIP and security groups) and BOSH (deploy manifest), but
I still need to generate a certificate locally...
$ openssl req -x509 -newkey rsa:2048 -nodes -days 3650 \
-keyout ~/cloque-acme-dev/aws-usw2/ssl.key \
-out ~/cloque-acme-dev/aws-usw2/ssl.crt
> Having a directory per deployment helps keep everything scoped and organized when there are additional artifacts.
> The templating nature of `cloque` allows the files to be embedded into its own deployment manifest, but also other
> deployment manifests. With the example of logsearch, this means I don't need to copy and paste the `ssl.crt` into
> other deployments, just embed it using a relative path (embeds are always relative to the config file - something
> BOSH ERBs struggle with): `{% raw %}{{ env.embed('../logsearch/ssl.crt') }}{% endraw %}`.
Once uploaded, I can use the `infra:put` and mirrored `bosh:put` command to push the infrastructure and BOSH
deployment (`-n` meaning non-interactive, just like with `bosh`)...
$ cloque infra:put
...snip...
> waiting...CREATE_IN_PROGRESS.....................CREATE_COMPLETE...done
$ cloque -n bosh:put
Getting deployment properties from director...
...snip...
Deployed `bosh.yml' to `acme-dev-aws-usw2'
Once complete, I can see the [elasticsearch][19] service running...
$ wget -qO- '10.101.17.26'
{
"status" : 200,
"name" : "elasticsearch/0",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
And I can see the ingestor listening on its EIP:
$ echo 'QUIT' | openssl s_client -showcerts -connect $( cloque infra:get '.Z0IngestorEipId' ):5614
CONNECTED(00000003)
And I can SSH into the instance...
$ cloque bosh:ssh
...snip...
bosh_j51114xze@c989cf2f-91e4-407e-a7d7-bdc03ef79511:~$
> The `bosh:ssh` command is a little more intelligent than `bosh ssh`. It will peek at the manifest to know if there's
> only a single job running, in which case the job/index argument becomes meaningless. Additionally, it always will
> use a default `sudo` password of `c1oudc0w` (avoiding the interactive delay and prompt that `bosh ssh` requires).
## Package Development
When I need to create a new package, I started using a convention where I'd add the origin URL where I found a
blob/file. This provides me with more of an audit over time, but also allows me to automate a `spec` file which looks
like:
---
name: "nginx"
files:
# http://nginx.org/download/nginx-1.7.2.tar.gz
- "nginx-blobs/nginx-1.7.2.tar.gz"
# ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.35.tar.gz
- "nginx-blobs/pcre-8.35.tar.gz"
# https://www.openssl.org/source/openssl-1.0.1h.tar.gz
- "nginx-blobs/openssl-1.0.1h.tar.gz"
...snip...
Into a series of `wget`s with the `boshutil:package-downloads` command...
$ cloque boshutil:package-downloads nginx
mkdir -p 'blobs/nginx-blobs'
[ -f 'blobs/nginx-blobs/nginx-1.7.2.tar.gz' ] || wget -O 'blobs/nginx-blobs/nginx-1.7.2.tar.gz' 'http://nginx.org/download/nginx-1.7.2.tar.gz'
[ -f 'blobs/nginx-blobs/pcre-8.35.tar.gz' ] || wget -O 'blobs/nginx-blobs/pcre-8.35.tar.gz' 'ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.35.tar.gz'
[ -f 'blobs/nginx-blobs/openssl-1.0.1h.tar.gz' ] || wget -O 'blobs/nginx-blobs/openssl-1.0.1h.tar.gz' 'https://www.openssl.org/source/openssl-1.0.1h.tar.gz'
...snip...
> I was tired of having to manually download files, `bosh add blob` them with the correct parameters and then having
> to manually delete the originals. This lets me completely avoid that step and ensures I'm using the files I expect.
> Whenever a blob is an internal file or `src`, I just take care of it manually like before.
When I'm working on a `packaging` script I use [Docker][22] images to emulate the build environment. Since 99% of my
build issues come from `configure` arguments and environment variables, this is normally sufficient. This also lets me
iteratively debug my packaging scripts as opposed to the slow, guess and check method of re-releasing and deploying the
whole thing to BOSH to test fixes. The `boshutil:package-docker-build` command helps me here...
$ cloque boshutil:package-docker-build ubuntu:trusty nginx
> compile/packaging...done
> compile/nginx-blobs/nginx-1.7.2.tar.gz...done
> compile/nginx-blobs/pcre-8.35.tar.gz...done
> compile/nginx-blobs/openssl-1.0.1h.tar.gz...done
...snip...
Sending build context to Docker daemon 7.571 MB
Sending build context to Docker daemon
Step 0 : FROM ubuntu:trusty
---> ba5877dc9bec
Step 1 : RUN apt-get update && apt-get -y install build-essential cmake m4 unzip wget
...snip...
root@347c1d4ca07b:/var/vcap/data/compile/nginx#
> This command mirrors the BOSH environment by using the `spec` file to add the referenced blobs, uploads the
> packaging script, configures the `BOSH_COMPILE_TARGET` and `BOSH_INSTALL_TARGET` variables, creates the directories,
> and switches to the compile directory, ready for me to type `./packaging` or paste commands iteratively. It also has
> the `--import-package` and `--export-package` options to import/dump the resulting `/var/vcap/packages/{name}`
> directory to support dependencies.
## Snaphots
One easy feature that BOSH has is snapshotting to get a full backup of its persistent disks. You can run its `take
snapshot` command for a particular job or for an entire deployment. Or, if "dirty" snapshots are okay, the Director can
schedule them automatically. To manage all those snapshots, I created a few commands. The first command takes care of
snapshots that the BOSH Director creates of itself...
$ cloque boshdirector:snapshots:cleanup-self 3d
snap-4219f4fb -> 2014-09-13T06:01:14+00:00 -> deleted
snap-2e6588e4 -> 2014-09-13T06:03:55+00:00 -> deleted
snap-1acd90d3 -> 2014-09-13T06:06:36+00:00 -> deleted
snap-618c7da9 -> 2014-09-14T06:01:15+00:00 -> retained
snap-dce22315 -> 2014-09-14T06:03:55+00:00 -> retained
snap-a9e81a60 -> 2014-09-14T06:06:35+00:00 -> retained
snap-d35ea51a -> 2014-09-15T06:01:18+00:00 -> retained
snap-3742b88e -> 2014-09-15T06:03:58+00:00 -> retained
snap-0b8b40c2 -> 2014-09-15T06:06:38+00:00 -> retained
snap-ea16dfd3 -> 2014-09-16T06:01:18+00:00 -> retained
snap-913df459 -> 2014-09-16T06:03:58+00:00 -> retained
snap-82d5fc4b -> 2014-09-16T06:06:38+00:00 -> retained
> This command is simplistic and trims all snapshots earlier than a given period (in this case three days). I got very
> tired and forgetful about regularly cleaning up snapshots from the AWS Console. It communicates directly with the
> AWS API since the `bosh` command doesn't seem to enumerate them.
The command for individual deployment snapshots is a bit more intelligent. It allows writing logic which, when passed a
given snapshot, determines whether it should be retained or deleted. For example...
$ cloque boshdirector:snapshots:cleanup
...snip...
snap-7837f7d4 -> 2014-08-01T07:01:30+00:00 -> dirty -> retained
snap-62cca4de -> 2014-08-04T07:00:28+00:00 -> dirty -> retained
snap-bdd29512 -> 2014-08-04T22:51:57+00:00 -> clean -> retained
snap-4dd5a3e1 -> 2014-08-04T23:46:23+00:00 -> clean -> retained
snap-2bb7c784 -> 2014-08-11T07:00:46+00:00 -> dirty -> retained
snap-5239b7fc -> 2014-08-18T07:00:40+00:00 -> dirty -> retained
snap-cf6fcb6e -> 2014-08-25T07:00:39+00:00 -> dirty -> retained
snap-9d00103c -> 2014-08-28T13:34:39+00:00 -> clean -> retained
snap-9d80103d -> 2014-09-01T07:00:43+00:00 -> dirty -> retained
snap-79c18cda -> 2014-09-08T07:00:44+00:00 -> dirty -> retained
snap-87f47a24 -> 2014-09-09T07:00:57+00:00 -> dirty -> deleted
snap-5fec87fc -> 2014-09-10T07:00:55+00:00 -> dirty -> retained
snap-bdfeda1e -> 2014-09-11T07:00:58+00:00 -> dirty -> retained
snap-246b6987 -> 2014-09-12T07:00:54+00:00 -> dirty -> retained
snap-c234d870 -> 2014-09-13T07:00:43+00:00 -> dirty -> retained
snap-28ed128a -> 2014-09-14T07:00:55+00:00 -> dirty -> retained
snap-ef6ac34d -> 2014-09-15T07:00:55+00:00 -> dirty -> retained
snap-72c156d3 -> 2014-09-16T07:00:42+00:00 -> dirty -> retained
> The command looks for a deployment-specific file which receives information about the snapshot (ID, date,
> clean/dirty) and returns `true` to cleanup/delete or `false` to retain. This allows me to create some very custom
> retention policies for individual deployments, depending on their requirements. In this example, clean snapshots are
> kept 3 months, Mondays are kept for 6 months, first of month is kept indefinitely, everything else kept for 1 week.
## Revitalizing
In the past I've typically used local VMs with [VirtualBox][23] or [VMWare Fusion][24] for personal development.
Unfortunately they always seemed to drift from production servers, which made things inconvenient, at best. With BOSH,
it became trivial for me to start/stop deployments and guarantee they have a known environment. When my VMs were local
I always had scripts which would pull down backups, restore them, and clean up data for development. With `cloque` I've
been using a `revitalize` concept which allows me to restore data from snapshots or run arbitrary commands. For
example, I can add the following to my database job to restore data from a slave's most recent snapshot...
jobs:
- name: "mysql"
...snip...
cloque.revitalize:
- method: "snapshot_copy"
director: "example-acme-aws-usw2"
deployment: "wordpress-demo-hotcopy"
job: "mysql"
- method: "script"
script: "{{ env.embed('revitalize.sh') }}"
> The `snapshot_copy` method takes care of finding the most recent snapshot with the given parameters and would copy
> the data onto the local `/var/vcap/store` directory (trashing anything it replaces). The `script` method allows an
> arbitrary script to run, in this case, one that resets the MySQL users/passwords and cleans data for development
> purposes.
Whenever I want to reload my dev deployment with more recent production data (or after I've sufficiently polluted my
dev data), I can just run the `bosh:revitalize` task...
$ cloque bosh:revitalize
> mysql/0
> finding 10.101.17.41...
> instance-id -> i-fe0e23f3
> availability-zone -> us-west-2w
> stopping services...
> waiting...............done
> snapshot_copy
> finding snapshot...
> snapshot-id -> snap-3867159a
> start-time -> 2014-09-16T06:58:31.000Z
> creating volume...
> volume-id -> vol-edc5bfe9
> waiting...creating...available...done
> attaching volume...
> waiting...in-use...done
> mounting volume...
> transferring data...
> removing mysql...done
> restoring mysql...done
> unmounting volume...
> detaching volume...
> waiting...in-use......available...done
> destroying volume...
> script...
> starting services...
...snip...
> This also makes it easy for me to condense services which run on multiple machines in production onto a single
> machine for development by restoring from multiple snapshots (as long as the services `store` directories are
> properly named).
## Configuration Transformations
I mentioned earlier that configuration files are templates. In addition to basic templating capabilities, I added some
transformation options. Transformations allow a processor to receive the current state of the configuration, do some
magic to it, and return a new configuration. The easiest example of this is with logging - I want to centralize all my
log messages and [`collectd`][26] measurements. Here I'll use [logsearch-shipper-boshrelease][25], but regardless of
how it's done, it typically requires adding a new release to your deployment, adding the job template to every job, and
adding the correct properties. When you have multiple deployments, this becomes tedious and this is where a
transformation shines. The transform could take care of the following:
* adding the `logsearch` properties (SSL key, `bosh_director` field to messages, EIP lookup for the ingestor)
* add the `logsearch-shipper` release to the deployment
* add the `logsearch-shipper` job template to every job
And raw code for that transform could go in `aws-usw2/logsearch/shipper-transform.php`:
<?php return function ($config, array $options, array $params) {
// add our required properties
$config['properties']['logsearch'] = [
'logs' => [
'_defaults' => implode("\n", [
'---',
'files:',
' "**/*.log":',
' fields:',
' type: "unknown"',
' bosh_director: "' . $params['network_name'] . '-' . $params['director_name'] . '"',
]),
'server' => $params['env']['self/infrastructure/logsearch']['Z0IngestorEipId'] . ':5614',
'ssl_ca_certificate' => $params['env']->embed(__DIR__ . '/ssl.crt'),
],
'metrics' => [
'frequency' => 60,
],
];
// add the template job to all jobs
foreach ($config['jobs'] as &$job) {
$job['templates'][] = [
'release' => 'logsearch-shipper',
'name' => 'logsearch-shipper',
];
}
// add the release, if it's not explicitly using a version
if (!in_array('logsearch-shipper', array_map(function ($a) { return $a['name']; }, $config['releases']))) {
$config['releases'][] = [
'name' => 'logsearch-shipper',
'version' => '1',
];
}
return $config;
};
And then whenever I want a deployment to forward its logs with `logsearch-shipper`, I only need to add the following to
the root level of my `bosh.yml` deployment manifest...
_transformers:
- path: "../logsearch/shipper-transform.php"
> This approach helps me keep my deployment manifests concise. Rather than clutter up my definitions with ancillary
> configuration and sidekick jobs, they remain focused on the services they're actually providing.
## Tagging
Since starting with BOSH, I've used AWS tags more heavily. I consistently use the `director` tag to represent the
`{network_name}-{region_name}` (e.g. `acme-dev-aws-usw2`) and the `deployment` tag to represent the logical set of
services (regardless of whether BOSH is managing them or not). I made another command which can enumerate relevant
resources and ensure they have the expected tags:
$ cloque utility:tag-resources
> reviewing us-west-2...
> acme-dev-aws-usw2/bosh/microbosh -> i-298fb0c6
> /dev/xvda -> vol-d46fa79b
> adding director -> acme-dev-aws-usw2
> adding deployment -> microbosh
> adding Name -> microbosh/0/xvda
> /dev/sdb -> vol-8b6c46c6
> adding director -> acme-dev-aws-usw2
> adding deployment -> microbosh
> adding Name -> microbosh/0/sdb
> /dev/sdf -> vol-8a6d46c6
> adding director -> acme-dev-aws-usw2
> adding deployment -> microbosh
> adding Name -> microbosh/0/sdf
> acme-dev-aws-usw2/logsearch/main/0 -> i-46be80b9
> /dev/sda -> vol-fa4e57b5
> adding director -> acme-dev-aws-usw2
> adding deployment -> logsearch
> adding Name -> main/0/sda
> /dev/sdf -> vol-73e0ce3e
> acme-dev-aws-usw2/infra/core/z1/gateway -> i-8d60f6a2
> /dev/sda1 -> vol-7b5b7838
> I added this command because I wanted to be sure my volumes were all accurately tagged. This helps me when using the
> AWS Console, but it also provides more detail in the AWS Billing Reports when the `director` and `deployment` tags
> are included for detailed billing.
## Conclusion
BOSH is far from perfect, in my mind, but with a little help it is enabling me to be more productive and effective
than other tools I've tried in the areas which are most important to me.
[1]: http://docs.cloudfoundry.org/bosh/
[2]: https://github.com/logsearch/logsearch-boshrelease
[3]: https://github.com/dpb587/cloque
[4]: http://aws.amazon.com/
[5]: http://aws.amazon.com/cloudformation/
[6]: http://www.terraform.io/
[7]: https://github.com/dpb587/cloque/blob/master/share/
[8]: http://openvpn.net/
[9]: https://github.com/dpb587/cloque/blob/master/share/local-core-infrastructure.yml
[10]: http://twig.sensiolabs.org/
[11]: http://console.aws.amazon.com/
[12]: http://stedolan.github.io/jq/
[13]: https://code.google.com/p/tunnelblick/
[14]: https://gist.githubusercontent.com/dpb587/c0427635b3316584e12e/raw/183ccda6c504fac02754b79b5a5b267848a70025/transfer-ami.sh
[15]: http://bosh_artifacts.cfapps.io/
[16]: https://github.com/cloudfoundry/bosh/tree/master/bosh_cli_plugin_micro
[18]: https://github.com/dpb587/cloque/blob/master/share/example-multi/network.yml
[19]: http://www.elasticsearch.org/
[20]: /blog/2014/02/28/distributed-docker-containers.html#the-alternatives
[21]: https://github.com/cloudfoundry-incubator/spiff
[22]: https://www.docker.com/
[23]: https://www.virtualbox.org/
[24]: http://www.vmware.com/products/fusion
[25]: https://github.com/logsearch/logsearch-shipper-boshrelease/
[26]: http://collectd.org/
[27]: https://github.com/dpb587/cloque/blob/master/share/example-multi/global/core/infrastructure.json
[28]: https://github.com/dpb587/cloque/blob/master/share/example-multi/aws-usw2/core/infrastructure.json