diff --git a/blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md b/blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md new file mode 100644 index 0000000..8096887 --- /dev/null +++ b/blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md @@ -0,0 +1,737 @@ +--- +title: "Simplifying My BOSH-related Workflows" +layout: "post" +tags: [ "aws", "bosh", "cloudformation", "cloudfoundry", "cloque", "docker", "ec2", "packaging", "snapshots", "twig" ] +description: "Discussing some commands and wrappers I've been adding on top of BOSH." +--- + +Over the last nine months I've been getting into [BOSH][1] quite a bit. Historically, I've been [reluctant][20] to +invest in BOSH because I don't entirely agree with its architecture and steep learning curve. BOSH +[describes itself][1] with... + + > BOSH installs and updates software packages on large numbers of VMs over many IaaS providers with the absolute + > minimum of configuration changes. + > + > BOSH orchestrates initial deployments and ongoing updates that are: + > + > * Predictable, repeatable, and reliable + > * Self-healing + > * Infrastructure-agnostic + +With continued use and experience necessitated from the [logsearch][2] project, I saw ways it would solve more critical +problems for me than it would create. For that reason, I started experimenting and migrating some services over to +BOSH to better evaluate it for my own uses. To help bridge the gap between BOSH inconveniences and some of my +architectural/practical differences I've been making a tool called [`cloque`][3]. + +You might find the ideas more useful rather than the `cloque` code itself - it is, after all, experimental and written +in PHP (since that's why I'm most productive in) whereas `bosh` is more Ruby/Go-oriented. + + +## Infrastructure First + +Generally speaking, BOSH needs some help with infrastructure (i.e. it can't create its own VPC, network routing tables, +etc). Additionally, sometimes deployments don't even need the BOSH overhead. Within `cloque`, I've split management +tasks into two components: + + * Infrastructure - this is more of the "physical" layer defining the networking layer, some independent services (e.g. + NAT gateways, VPN servers), security groups, and other core or non-BOSH functionality. + * BOSH - everything related to BOSH (e.g. director, deployment, snapshots, releases, stemcells) which is deployed onto + the infrastructure somewhere. + +Since BOSH depends on some infrastructure, we'll get started with that first. One key to a `cloque`-managed environment +is that each environment has its own directory which includes a `network.yml` in the top-level. The network may be +located in a single datacenter, or it could span multiple countries. The file defines all the basics about the network +including subnets, reserved IPs, basic cloud properties, and some logical names. + +I've committed an example network to the [`share`][7] directory within `cloque` and will use that in the examples here. +To get started, we'll copy the example and work with it... + + # copy the sample environment + $ cp -r ~/cloque/share/example-multi ~/cloque-acme-dev + $ cd ~/cloque-acme-dev + + # this will help the command know where to look for configs later + $ export CLOQUE_BASEDIR="$PWD" + +If you take a look at the sample [`network.yml`][18], you'll see a couple regions with their individual network +segments, VPN networks, and a few reserved IP addresses which can be referenced elsewhere. Once `network.yml` is +created, the `utility:initialize-network` task can take care of bootstrapping the following: + + * create stub folders for your different regions; e.g. `aws-apne1/core`, `global/private`) + * create a new SSH key (in `global/private/cloque-{yyyymmdd}*.pem`) and upload it to the AWS regions being used + * create a new IAM user, access key, and EC2 policy for BOSH to use + * create a certificate authority for [OpenVPN][8] usage + * create both client/server certificates for the inter-region VPN connections (requires interactive prompts for + passwords/confirmations) + * create an S3 bucket for shared configuration storage + +When run, it assumes AWS credentials can be discovered from the environment... + + $ cloque utility:initialize-network + > local:fs/global -> created + ...snip... + + > I created `utility:initiailize-network` because I found myself reusing keys and buckets across multiple environments + > (such as development vs production) because they were annoying to manage by hand. I wanted to make security easier + > for myself and, in the process, simplify the processes through automation. + +The top-level `global` directory is intended for configuration which applies to all areas. With the example I use it to +create an additional IAM role which allows VPN gateways to securely download their VPN keys and configuration files... + + $ ( cd global/core && cloque infra:put --aws-cloudformation 'Capabilities=["CAPABILITY_IAM"]' ) + > validating...done + > checking...missing + > deploying...done + > waiting...CREATE_IN_PROGRESS...........................CREATE_COMPLETE...done + +The `infra:put` is the core command responsible for managing the low-level, infrastructure-related resources. The +command looks for an `infrastructure.json` file (see the [example][27]) and since I'm focused on [AWS][4], the files +are [CloudFormation][5] scripts. + + > One thing I dislike about BOSH is how it uses a state file or global options to specify the director/deployment. It + > makes it very inconvenient to quickly switch between directors/deployments even between multiple terminal sessions. + > To help with that, `cloque` respects environment variables (or command line options) to know where it should be + > working from. The `CLOQUE_BASEDIR` (exported earlier) is the most significant, and it was able to detect when it was + > working from the `global` region/director and `core` deployment based on the current directory. + +Now that the global resources have been created, we can create our "core" resources for the `us-west-2` region. If you +take a look at the [infrastructure.json][28] file, you'll see it creates a VPC, multiple subnets for each availability +zone, a couple base security groups, and a gateway instance which will function as a VPN server to allow inter-region +communication. You'll also notice it's using [Twig][10] templating to load `network.yml` and simplify what would be a +lot of repeated resources. We'll use the `infra:put` command again, but this time within the `aws-usw2/core` +directory... + + $ cd aws-usw2 + $ ( cd core && cloque infra:put ) + ...snip... + > waiting...CREATE_IN_PROGRESS.........................CREATE_COMPLETE...done + + > BOSH supports ERB-templated deploy manifests. With ERB I found myself repeating a lot of code in each manifest when + > trying to make it dynamic. After trying [spiff][21] (which I found a bit limited and difficult to understand), I + > decided to use a different approach - one that would allow for the same dynamic, peer-config referencing, and + > (later) transformational capabilities for both infrastructure configuration and BOSH deployment manifests. + +Once the `infra:put` command finishes, the `aws-usw2` part of the environment is complete which means the OpenVPN +server is ready for a client. First we'll need to create and sign a client certificate though... + + # temporary directory + $ mkdir tmp-myovpn + $ cd tmp-myovpn + + # create a key (named after the hostname and current date) + $ TMPOVPN_CN=$(hostname -s)-$(date +%Y%m%da) + $ openssl req \ + -subj "/C=US/ST=CO/L=Denver/O=ACME Inc/OU=client/CN=${TMPOVPN_CN}/emailAddress=`git config user.email`" \ + -days 3650 -nodes \ + -new -out openvpn.csr \ + -newkey rsa:2048 -keyout openvpn.key + Generating a 2048 bit RSA private key + .............................+++ + ................+++ + writing new private key to 'openvpn.key' + ----- + + # sign the certificate (you'll need to enter the PKI password you used in the first step) + $ cloque openvpn:sign-certificate openvpn.csr + + # now create the OpenVPN configuration profile for connecting to aws-usw2 + $ ( \ + cloque openvpn:generate-profile aws-usw2 $TMPOVPN_CN \ + ; echo '' \ + ; cat openvpn.key \ + ; echo '' \ + ) > acme-dev-aws-usw2.ovpn + + # opening should install it with a GUI connection manager like Tunnelblick + $ open acme-dev-aws-usw2.ovpn + + # cleanup + $ cd ../ + $ rm -fr tmp-myovpn + $ unset TMPOVPN_CN + + > I created the `openvpn:sign-certificate` and, namely, `openvpn:generate-profile` commands to make the steps highly + > reproducible to encourage better certificate usage practices through it's "trivialness". + +Since I'm using `example.com` in the `share` scripts as the domain, DNS won't resolve it. For now, the easiest solution +is to manually add an entry to `/etc/hosts`... + + $ echo "`cd core && cloque infra:get '.Z0GatewayEipId'` gateway.aws-usw2.acme-dev.cloque.example.com" \ + | sudo tee -a /etc/hosts + + > The `infra:get` command allows me to programmatically fetch configuration details about the current deployment. For + > infrastructure, this allows me to extract the created resource IDs/names using [jq][12] statements. This makes it + > extremely easy to automate basic lookup tasks (as in this case), but also allows for more complex IP or security + > group enumeration which can be used for other composable, automated tasks. + +Once `/etc/hosts` is updated, I can connect with an OpenVPN client like [Tunnelblick][13] and ping the network... + + $ ping -c 5 10.101.0.4 + PING 10.101.0.4 (10.101.0.4): 56 data bytes + 64 bytes from 10.101.0.4: icmp_seq=0 ttl=64 time=59.035 ms + 64 bytes from 10.101.0.4: icmp_seq=1 ttl=64 time=61.288 ms + 64 bytes from 10.101.0.4: icmp_seq=2 ttl=64 time=78.194 ms + 64 bytes from 10.101.0.4: icmp_seq=3 ttl=64 time=57.850 ms + 64 bytes from 10.101.0.4: icmp_seq=4 ttl=64 time=57.956 ms + + --- 10.101.0.4 ping statistics --- + 5 packets transmitted, 5 packets received, 0.0% packet loss + round-trip min/avg/max/stddev = 57.850/62.865/78.194/7.764 ms + + +## BOSH Director + +Now that we have a VPC and a private network to deploy things into, we can start a BOSH Director. Here it's important +to note that I'm using "region", "network segment", and "director" interchangeably. Typically you'll have a single BOSH +Director within an environment's region, and since that Director will tag it's deployment resources with a "director" +tag, I decided to make them all synonyms. The effect is twofold: + + * when you see a "director" name (whether it's in the context of BOSH or not) it refers to where resources are + provisioned + * you can consistently use a "director" tag (BOSH or not) to identify where something is deployed which makes AWS + resource management much simpler (and AWS Billing reports by tag much more valuable). + +Back to getting BOSH deployed though. First, we'll create some additional BOSH-specific, region-specific infrastructure +(specifically, security groups for the director and agents)... + + $ ( cd bosh && cloque infra:put ) + ...snip... + > waiting...CREATE_IN_PROGRESS...............CREATE_COMPLETE...done + + > Here I start using the `bosh` directory. I put Director-related configuration in the `bosh` deployment. Individual + > BOSH deployments get their own directory. + +Once the security groups are available, we can create the BOSH Director. The `boshdirector:*` commands deal with the +Director tasks (i.e. they don't depend on a specific deployment). To get started, the `boshdirector:inception:start` +command takes care of provisioning the inception instance (it takes a few minutes to get everything installed and +configured)... + + $ cloque boshdirector:inception:start \ + --security-group $( cloque --deployment=core infra:get '.TrustedPeerSecurityGroupId' ) \ + --security-group $( cloque --deployment=core infra:get '.PublicGlobalEgressSecurityGroupId' ) \ + $(cloque infra:get '.SubnetZ0PublicId') \ + t2.micro + > finding instance...missing + > instance-id -> i-f84169f3 + > tagging director -> acme-dev-aws-usw2 + > tagging deployment -> cloque/inception + > tagging Name -> main + > waiting for instance...pending.........running...done + > waiting for ssh.......done + > installing... + ...snip... + > uploading compiled/self... + ...snip... + > uploading global/private... + ...snip... + + > You'll notice the `cloque --deployment=core infra:get` usage to to load the security groups. The `--deployment` + > option is an alternative to running `cd ../core` before the command. Another alternative would be to use the + > `CLOQUE_DEPLOYMENT` environment variable. Whatever the case, `cloque` is intelligent and flexible about figuring out + > where it should be working from. + +Before continuing, there's still a manual process of finding the correct stemcell. If we were in `us-east-1`, we could +use the "light-bosh" stemcell (which is really just an alias to a pre-compiled AMI that Cloud Foundry publishes). +Unfortunately, we need to take the slower route of compiling our own AMI for `us-west-2`. To do this, we need to lookup +the latest stemcell URL from the [published artifacts][15], then we pass that URL to the next command... + + $ cloque boshdirector:inception:provision \ + https://s3.amazonaws.com/bosh-jenkins-artifacts/bosh-stemcell/aws/bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz + > finding instance...found + > instance-id -> i-f84169f3 + > deploying... + WARNING! Your target has been changed to `https://10.101.16.8:25555'! + Deployment set to '/home/ubuntu/cloque/self/bosh/bosh.yml' + + Verifying stemcell... + File exists and readable OK + Verifying tarball... + Read tarball OK + Manifest exists OK + Stemcell image file OK + Stemcell properties OK + + Stemcell info + ------------- + Name: bosh-aws-xen-ubuntu-trusty-go_agent + Version: 2710 + + Started deploy micro bosh + Started deploy micro bosh > Unpacking stemcell. Done (00:00:18) + Started deploy micro bosh > Uploading stemcell. Done (00:05:16) + Started deploy micro bosh > Creating VM from ami-8fe7a1bf. Done (00:00:19) + Started deploy micro bosh > Waiting for the agent. Done (00:01:19) + Started deploy micro bosh > Updating persistent disk + Started deploy micro bosh > Create disk. Done (00:00:02) + Started deploy micro bosh > Mount disk. Done (00:00:09) + Done deploy micro bosh > Updating persistent disk (00:00:19) + Started deploy micro bosh > Stopping agent services. Done (00:00:01) + Started deploy micro bosh > Applying micro BOSH spec. Done (00:00:21) + Started deploy micro bosh > Starting agent services. Done (00:00:01) + Started deploy micro bosh > Waiting for the director. Done (00:00:19) + Done deploy micro bosh (00:08:13) + Deployed `bosh/bosh.yml' to `https://10.101.16.8:25555', took 00:08:13 to complete + > fetching bosh-deployments.yml... + receiving file list ... + 1 file to consider + bosh-deployments.yml + 1025 100% 1000.98kB/s 0:00:00 (xfer#1, to-check=0/1) + + sent 38 bytes received 723 bytes 101.47 bytes/sec + total size is 1025 speedup is 1.35 + > tagging...done + + > The `:start` command took care of pushing the compiled manifest, but this `:provision` command is responsible for + > pushing everything to the director and, once complete, downloading the resulting configuration locally. I created + > these two commands because they were a common task and the manual, iterative process was getting tiresome. It also + > helps unify both the intitial provisioning vs upgrade process *and* deploying from AMI vs TGZ. Instead of ~12 manual + > steps spread out over ~30 minutes, I only need to intervene at three points (including instance termination). + +Once the provisioning step is complete, I can login and talk to BOSH... + + # default username/password is admin/admin + $ bosh target https://10.101.16.8:25555 + $ bosh status + Config + /Users/dpb587/cloque-acme-dev/aws-usw2/.bosh_config + + Director + Name acme-dev-aws-usw2 + URL https://10.101.16.8:25555 + Version 1.2710.0 (00000000) + User admin + UUID f38d685c-9a72-4fc0-bc84-558979cc80bf + CPI aws + dns enabled (domain_name: microbosh) + compiled_package_cache disabled + snapshots disabled + + Deployment + not set + +Since BOSH Director is successfully running, it's safe to terminate the inception instance. Whenever there's a new BOSH +version I want to deploy, I can just rerun the two `start` and `provision` commands (with an updated stemcell URL) +and it will take care of upgrading it. + + +### More on Stemcells + +While inception was deploying the BOSH Director, it ended up making a stemcell that I can reuse for our BOSH +deployments. Unfortunately, the Director doesn't know about it. The following command takes care of publishing it... + + $ cloque boshutil:create-bosh-lite-stemcell-from-ami \ + https://s3.amazonaws.com/bosh-jenkins-artifacts/bosh-stemcell/aws/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz \ + ami-8fe7a1bf + Uploaded Stemcell: https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz + + > The command uses the URL (the light-bosh stemcell of the same version from the [artifacts][15] page) as a template + > and patches in the correct metadata for the local region. It then takes care of uploading it to the environment's S3 + > bucket and to the Director so it's immediately usable. + +Another task I frequently need to do is convert the standard stemcells (which only support the PV virtualization) into +HVM stemcells that I can use with AWS's newer instance types. This next command takes care of all those steps +and, once complete, there will be a new `*-hvm` stemcell ready for use on the Director. + + $ cloque boshutil:convert-pv-stemcell-to-hvm \ + https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz \ + ami-d13845e1 \ + $( cloque --deployment=core infra:get '.SubnetZ0PrivateId , .TrustedPeerSecurityGroupId' ) + Created AMI: ami-f3e3a5c3 + Uploaded Stemcell: https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent-hvm.tgz + + > The command needs the light-bosh TGZ and AMI for the existing PV stemcell as well as a subnet and security group for + > it to provision the conversion instances in. + + +## BOSH Deployment + +Now that the BOSH Director is running, I can deploy something interesting onto it. Let's use [logearch][2] as an +example. First I'll need to clone the repository... + + $ git clone https://github.com/logsearch/logsearch-boshrelease.git ~/logsearch-boshrelease + $ cd ~/logsearch-boshrelease + +Since I've changed directories away from our environment, `cloque` will no longer know where to find its environment +information. To help, I'll use a `.env` file... + + $ ( \ + echo 'export CLOQUE_BASEDIR=~/cloque-acme-dev' \ + ; echo 'export CLOQUE_DIRECTOR=aws-usw2' \ + ; echo 'export CLOQUE_DEPLOYMENT=logsearch' \ + ) > .env + + > I mentioned before that `cloque` uses the current working directory, environment variables, and command options to + > figure out where to look for things. If it's still missing information, it will check and load a `.env` file from + > the current directory as a last resort. This is normally only useful during development where I already use `.env` + > for other project-specific BASH `alias`es and variables. + +Now I can upload the release... + + $ cloque boshdirector:releases:put releases/logsearch-latest.yml + + > Since releases are Director-specific and unrelated to a particular deployment, It uses the `boshdirector:*` + > namespace. + +The example has the configuration files for infrastructure (EIP and security groups) and BOSH (deploy manifest), but +I still need to generate a certificate locally... + + $ openssl req -x509 -newkey rsa:2048 -nodes -days 3650 \ + -keyout ~/cloque-acme-dev/aws-usw2/ssl.key \ + -out ~/cloque-acme-dev/aws-usw2/ssl.crt + + > Having a directory per deployment helps keep everything scoped and organized when there are additional artifacts. + > The templating nature of `cloque` allows the files to be embedded into its own deployment manifest, but also other + > deployment manifests. With the example of logsearch, this means I don't need to copy and paste the `ssl.crt` into + > other deployments, just embed it using a relative path (embeds are always relative to the config file - something + > BOSH ERBs struggle with): `{% raw %}{{ env.embed('../logsearch/ssl.crt') }}{% endraw %}`. + +Once uploaded, I can use the `infra:put` and mirrored `bosh:put` command to push the infrastructure and BOSH +deployment (`-n` meaning non-interactive, just like with `bosh`)... + + $ cloque infra:put + ...snip... + > waiting...CREATE_IN_PROGRESS.....................CREATE_COMPLETE...done + + $ cloque -n bosh:put + Getting deployment properties from director... + ...snip... + Deployed `bosh.yml' to `acme-dev-aws-usw2' + +Once complete, I can see the [elasticsearch][19] service running... + + $ wget -qO- '10.101.17.26' + { + "status" : 200, + "name" : "elasticsearch/0", + "version" : { + "number" : "1.2.1", + "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364", + "build_timestamp" : "2014-06-03T15:02:52Z", + "build_snapshot" : false, + "lucene_version" : "4.8" + }, + "tagline" : "You Know, for Search" + } + +And I can see the ingestor listening on its EIP: + + $ echo 'QUIT' | openssl s_client -showcerts -connect $( cloque infra:get '.Z0IngestorEipId' ):5614 + CONNECTED(00000003) + +And I can SSH into the instance... + + $ cloque bosh:ssh + ...snip... + bosh_j51114xze@c989cf2f-91e4-407e-a7d7-bdc03ef79511:~$ + + > The `bosh:ssh` command is a little more intelligent than `bosh ssh`. It will peek at the manifest to know if there's + > only a single job running, in which case the job/index argument becomes meaningless. Additionally, it always will + > use a default `sudo` password of `c1oudc0w` (avoiding the interactive delay and prompt that `bosh ssh` requires). + + +## Package Development + +When I need to create a new package, I started using a convention where I'd add the origin URL where I found a +blob/file. This provides me with more of an audit over time, but also allows me to automate a `spec` file which looks +like: + + --- + name: "nginx" + files: + # http://nginx.org/download/nginx-1.7.2.tar.gz + - "nginx-blobs/nginx-1.7.2.tar.gz" + # ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.35.tar.gz + - "nginx-blobs/pcre-8.35.tar.gz" + # https://www.openssl.org/source/openssl-1.0.1h.tar.gz + - "nginx-blobs/openssl-1.0.1h.tar.gz" + ...snip... + +Into a series of `wget`s with the `boshutil:package-downloads` command... + + $ cloque boshutil:package-downloads nginx + mkdir -p 'blobs/nginx-blobs' + [ -f 'blobs/nginx-blobs/nginx-1.7.2.tar.gz' ] || wget -O 'blobs/nginx-blobs/nginx-1.7.2.tar.gz' 'http://nginx.org/download/nginx-1.7.2.tar.gz' + [ -f 'blobs/nginx-blobs/pcre-8.35.tar.gz' ] || wget -O 'blobs/nginx-blobs/pcre-8.35.tar.gz' 'ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.35.tar.gz' + [ -f 'blobs/nginx-blobs/openssl-1.0.1h.tar.gz' ] || wget -O 'blobs/nginx-blobs/openssl-1.0.1h.tar.gz' 'https://www.openssl.org/source/openssl-1.0.1h.tar.gz' + ...snip... + + > I was tired of having to manually download files, `bosh add blob` them with the correct parameters and then having + > to manually delete the originals. This lets me completely avoid that step and ensures I'm using the files I expect. + > Whenever a blob is an internal file or `src`, I just take care of it manually like before. + +When I'm working on a `packaging` script I use [Docker][22] images to emulate the build environment. Since 99% of my +build issues come from `configure` arguments and environment variables, this is normally sufficient. This also lets me +iteratively debug my packaging scripts as opposed to the slow, guess and check method of re-releasing and deploying the +whole thing to BOSH to test fixes. The `boshutil:package-docker-build` command helps me here... + + $ cloque boshutil:package-docker-build ubuntu:trusty nginx + > compile/packaging...done + > compile/nginx-blobs/nginx-1.7.2.tar.gz...done + > compile/nginx-blobs/pcre-8.35.tar.gz...done + > compile/nginx-blobs/openssl-1.0.1h.tar.gz...done + ...snip... + Sending build context to Docker daemon 7.571 MB + Sending build context to Docker daemon + Step 0 : FROM ubuntu:trusty + ---> ba5877dc9bec + Step 1 : RUN apt-get update && apt-get -y install build-essential cmake m4 unzip wget + ...snip... + root@347c1d4ca07b:/var/vcap/data/compile/nginx# + + > This command mirrors the BOSH environment by using the `spec` file to add the referenced blobs, uploads the + > packaging script, configures the `BOSH_COMPILE_TARGET` and `BOSH_INSTALL_TARGET` variables, creates the directories, + > and switches to the compile directory, ready for me to type `./packaging` or paste commands iteratively. It also has + > the `--import-package` and `--export-package` options to import/dump the resulting `/var/vcap/packages/{name}` + > directory to support dependencies. + + +## Snaphots + +One easy feature that BOSH has is snapshotting to get a full backup of its persistent disks. You can run its `take +snapshot` command for a particular job or for an entire deployment. Or, if "dirty" snapshots are okay, the Director can +schedule them automatically. To manage all those snapshots, I created a few commands. The first command takes care of +snapshots that the BOSH Director creates of itself... + + $ cloque boshdirector:snapshots:cleanup-self 3d + snap-4219f4fb -> 2014-09-13T06:01:14+00:00 -> deleted + snap-2e6588e4 -> 2014-09-13T06:03:55+00:00 -> deleted + snap-1acd90d3 -> 2014-09-13T06:06:36+00:00 -> deleted + snap-618c7da9 -> 2014-09-14T06:01:15+00:00 -> retained + snap-dce22315 -> 2014-09-14T06:03:55+00:00 -> retained + snap-a9e81a60 -> 2014-09-14T06:06:35+00:00 -> retained + snap-d35ea51a -> 2014-09-15T06:01:18+00:00 -> retained + snap-3742b88e -> 2014-09-15T06:03:58+00:00 -> retained + snap-0b8b40c2 -> 2014-09-15T06:06:38+00:00 -> retained + snap-ea16dfd3 -> 2014-09-16T06:01:18+00:00 -> retained + snap-913df459 -> 2014-09-16T06:03:58+00:00 -> retained + snap-82d5fc4b -> 2014-09-16T06:06:38+00:00 -> retained + + > This command is simplistic and trims all snapshots earlier than a given period (in this case three days). I got very + > tired and forgetful about regularly cleaning up snapshots from the AWS Console. It communicates directly with the + > AWS API since the `bosh` command doesn't seem to enumerate them. + +The command for individual deployment snapshots is a bit more intelligent. It allows writing logic which, when passed a +given snapshot, determines whether it should be retained or deleted. For example... + + $ cloque boshdirector:snapshots:cleanup + ...snip... + snap-7837f7d4 -> 2014-08-01T07:01:30+00:00 -> dirty -> retained + snap-62cca4de -> 2014-08-04T07:00:28+00:00 -> dirty -> retained + snap-bdd29512 -> 2014-08-04T22:51:57+00:00 -> clean -> retained + snap-4dd5a3e1 -> 2014-08-04T23:46:23+00:00 -> clean -> retained + snap-2bb7c784 -> 2014-08-11T07:00:46+00:00 -> dirty -> retained + snap-5239b7fc -> 2014-08-18T07:00:40+00:00 -> dirty -> retained + snap-cf6fcb6e -> 2014-08-25T07:00:39+00:00 -> dirty -> retained + snap-9d00103c -> 2014-08-28T13:34:39+00:00 -> clean -> retained + snap-9d80103d -> 2014-09-01T07:00:43+00:00 -> dirty -> retained + snap-79c18cda -> 2014-09-08T07:00:44+00:00 -> dirty -> retained + snap-87f47a24 -> 2014-09-09T07:00:57+00:00 -> dirty -> deleted + snap-5fec87fc -> 2014-09-10T07:00:55+00:00 -> dirty -> retained + snap-bdfeda1e -> 2014-09-11T07:00:58+00:00 -> dirty -> retained + snap-246b6987 -> 2014-09-12T07:00:54+00:00 -> dirty -> retained + snap-c234d870 -> 2014-09-13T07:00:43+00:00 -> dirty -> retained + snap-28ed128a -> 2014-09-14T07:00:55+00:00 -> dirty -> retained + snap-ef6ac34d -> 2014-09-15T07:00:55+00:00 -> dirty -> retained + snap-72c156d3 -> 2014-09-16T07:00:42+00:00 -> dirty -> retained + + > The command looks for a deployment-specific file which receives information about the snapshot (ID, date, + > clean/dirty) and returns `true` to cleanup/delete or `false` to retain. This allows me to create some very custom + > retention policies for individual deployments, depending on their requirements. In this example, clean snapshots are + > kept 3 months, Mondays are kept for 6 months, first of month is kept indefinitely, everything else kept for 1 week. + + +## Revitalizing + +In the past I've typically used local VMs with [VirtualBox][23] or [VMWare Fusion][24] for personal development. +Unfortunately they always seemed to drift from production servers, which made things inconvenient, at best. With BOSH, +it became trivial for me to start/stop deployments and guarantee they have a known environment. When my VMs were local +I always had scripts which would pull down backups, restore them, and clean up data for development. With `cloque` I've +been using a `revitalize` concept which allows me to restore data from snapshots or run arbitrary commands. For +example, I can add the following to my database job to restore data from a slave's most recent snapshot... + + jobs: + - name: "mysql" + ...snip... + cloque.revitalize: + - method: "snapshot_copy" + director: "example-acme-aws-usw2" + deployment: "wordpress-demo-hotcopy" + job: "mysql" + - method: "script" + script: "{{ env.embed('revitalize.sh') }}" + + > The `snapshot_copy` method takes care of finding the most recent snapshot with the given parameters and would copy + > the data onto the local `/var/vcap/store` directory (trashing anything it replaces). The `script` method allows an + > arbitrary script to run, in this case, one that resets the MySQL users/passwords and cleans data for development + > purposes. + +Whenever I want to reload my dev deployment with more recent production data (or after I've sufficiently polluted my +dev data), I can just run the `bosh:revitalize` task... + + $ cloque bosh:revitalize + > mysql/0 + > finding 10.101.17.41... + > instance-id -> i-fe0e23f3 + > availability-zone -> us-west-2w + > stopping services... + > waiting...............done + > snapshot_copy + > finding snapshot... + > snapshot-id -> snap-3867159a + > start-time -> 2014-09-16T06:58:31.000Z + > creating volume... + > volume-id -> vol-edc5bfe9 + > waiting...creating...available...done + > attaching volume... + > waiting...in-use...done + > mounting volume... + > transferring data... + > removing mysql...done + > restoring mysql...done + > unmounting volume... + > detaching volume... + > waiting...in-use......available...done + > destroying volume... + > script... + > starting services... + ...snip... + + > This also makes it easy for me to condense services which run on multiple machines in production onto a single + > machine for development by restoring from multiple snapshots (as long as the services `store` directories are + > properly named). + + +## Configuration Transformations + +I mentioned earlier that configuration files are templates. In addition to basic templating capabilities, I added some +transformation options. Transformations allow a processor to receive the current state of the configuration, do some +magic to it, and return a new configuration. The easiest example of this is with logging - I want to centralize all my +log messages and [`collectd`][26] measurements. Here I'll use [logsearch-shipper-boshrelease][25], but regardless of +how it's done, it typically requires adding a new release to your deployment, adding the job template to every job, and +adding the correct properties. When you have multiple deployments, this becomes tedious and this is where a +transformation shines. The transform could take care of the following: + + * adding the `logsearch` properties (SSL key, `bosh_director` field to messages, EIP lookup for the ingestor) + * add the `logsearch-shipper` release to the deployment + * add the `logsearch-shipper` job template to every job + +And raw code for that transform could go in `aws-usw2/logsearch/shipper-transform.php`: + + [ + '_defaults' => implode("\n", [ + '---', + 'files:', + ' "**/*.log":', + ' fields:', + ' type: "unknown"', + ' bosh_director: "' . $params['network_name'] . '-' . $params['director_name'] . '"', + ]), + 'server' => $params['env']['self/infrastructure/logsearch']['Z0IngestorEipId'] . ':5614', + 'ssl_ca_certificate' => $params['env']->embed(__DIR__ . '/ssl.crt'), + ], + 'metrics' => [ + 'frequency' => 60, + ], + ]; + + // add the template job to all jobs + foreach ($config['jobs'] as &$job) { + $job['templates'][] = [ + 'release' => 'logsearch-shipper', + 'name' => 'logsearch-shipper', + ]; + } + + // add the release, if it's not explicitly using a version + if (!in_array('logsearch-shipper', array_map(function ($a) { return $a['name']; }, $config['releases']))) { + $config['releases'][] = [ + 'name' => 'logsearch-shipper', + 'version' => '1', + ]; + } + + return $config; + }; + +And then whenever I want a deployment to forward its logs with `logsearch-shipper`, I only need to add the following to +the root level of my `bosh.yml` deployment manifest... + + _transformers: + - path: "../logsearch/shipper-transform.php" + + > This approach helps me keep my deployment manifests concise. Rather than clutter up my definitions with ancillary + > configuration and sidekick jobs, they remain focused on the services they're actually providing. + + +## Tagging + +Since starting with BOSH, I've used AWS tags more heavily. I consistently use the `director` tag to represent the +`{network_name}-{region_name}` (e.g. `acme-dev-aws-usw2`) and the `deployment` tag to represent the logical set of +services (regardless of whether BOSH is managing them or not). I made another command which can enumerate relevant +resources and ensure they have the expected tags: + + $ cloque utility:tag-resources + > reviewing us-west-2... + > acme-dev-aws-usw2/bosh/microbosh -> i-298fb0c6 + > /dev/xvda -> vol-d46fa79b + > adding director -> acme-dev-aws-usw2 + > adding deployment -> microbosh + > adding Name -> microbosh/0/xvda + > /dev/sdb -> vol-8b6c46c6 + > adding director -> acme-dev-aws-usw2 + > adding deployment -> microbosh + > adding Name -> microbosh/0/sdb + > /dev/sdf -> vol-8a6d46c6 + > adding director -> acme-dev-aws-usw2 + > adding deployment -> microbosh + > adding Name -> microbosh/0/sdf + > acme-dev-aws-usw2/logsearch/main/0 -> i-46be80b9 + > /dev/sda -> vol-fa4e57b5 + > adding director -> acme-dev-aws-usw2 + > adding deployment -> logsearch + > adding Name -> main/0/sda + > /dev/sdf -> vol-73e0ce3e + > acme-dev-aws-usw2/infra/core/z1/gateway -> i-8d60f6a2 + > /dev/sda1 -> vol-7b5b7838 + + > I added this command because I wanted to be sure my volumes were all accurately tagged. This helps me when using the + > AWS Console, but it also provides more detail in the AWS Billing Reports when the `director` and `deployment` tags + > are included for detailed billing. + + +## Conclusion + +BOSH is far from perfect, in my mind, but with a little help it is enabling me to be more productive and effective +than other tools I've tried in the areas which are most important to me. + + + [1]: http://docs.cloudfoundry.org/bosh/ + [2]: https://github.com/logsearch/logsearch-boshrelease + [3]: https://github.com/dpb587/cloque + [4]: http://aws.amazon.com/ + [5]: http://aws.amazon.com/cloudformation/ + [6]: http://www.terraform.io/ + [7]: https://github.com/dpb587/cloque/blob/master/share/ + [8]: http://openvpn.net/ + [9]: https://github.com/dpb587/cloque/blob/master/share/local-core-infrastructure.yml + [10]: http://twig.sensiolabs.org/ + [11]: http://console.aws.amazon.com/ + [12]: http://stedolan.github.io/jq/ + [13]: https://code.google.com/p/tunnelblick/ + [14]: https://gist.githubusercontent.com/dpb587/c0427635b3316584e12e/raw/183ccda6c504fac02754b79b5a5b267848a70025/transfer-ami.sh + [15]: http://bosh_artifacts.cfapps.io/ + [16]: https://github.com/cloudfoundry/bosh/tree/master/bosh_cli_plugin_micro + [18]: https://github.com/dpb587/cloque/blob/master/share/example-multi/network.yml + [19]: http://www.elasticsearch.org/ + [20]: /blog/2014/02/28/distributed-docker-containers.html#the-alternatives + [21]: https://github.com/cloudfoundry-incubator/spiff + [22]: https://www.docker.com/ + [23]: https://www.virtualbox.org/ + [24]: http://www.vmware.com/products/fusion + [25]: https://github.com/logsearch/logsearch-shipper-boshrelease/ + [26]: http://collectd.org/ + [27]: https://github.com/dpb587/cloque/blob/master/share/example-multi/global/core/infrastructure.json + [28]: https://github.com/dpb587/cloque/blob/master/share/example-multi/aws-usw2/core/infrastructure.json