599888). Their software has been a
-valuable timesaver for us.
-
-
- [1]: http://www.dymoendicia.com/segments/all-products/endicia-for-mac
- [2]: https://developer.apple.com/library/mac/#documentation/applescript/conceptual/applescriptx/Concepts/as_related_apps.html#//apple_ref/doc/uid/TP40001570-1149074-BAJEIHJA
- [3]: http://www.dymoendicia.com/
- [4]: https://gist.github.com/dpb587/4660132#file-endicia-purchase-postage-applescript
diff --git a/blog/_posts/2013-02-08-automating-backups-to-the-cloud.md b/blog/_posts/2013-02-08-automating-backups-to-the-cloud.md
deleted file mode 100644
index a40fd50..0000000
--- a/blog/_posts/2013-02-08-automating-backups-to-the-cloud.md
+++ /dev/null
@@ -1,173 +0,0 @@
----
-title: Automating Backups to the Cloud
-layout: post
-tags: [ 'backup', 'gpg', 's3' ]
-description: Combining gpg, Amazon S3 and IAM policies.
----
-
-Backups are extremely important and I've been experimenting with a few different methods. My concerns are always focused
-on maintaining data integrity, security, and availability. One of my current methods involves using asymmetric keys for
-secure storage and object versioning to ensure backup data can't undesirably be overwritten.
-
-
-## Encryption Keys
-
-For encryption and decryption I'm using asymmetric keys via [`gpg`][1]. This way, any server can generate and encrypt
-the data, but only administrators who have the private key could actually decrypt the data. Generating the
-administrative key looks like:
-
-{% highlight console %}{% raw %}
-$ gpg --gen-key
-gpg (GnuPG) 1.4.11; Copyright (C) 2010 Free Software Foundation, Inc.
-This is free software: you are free to change and redistribute it.
-There is NO WARRANTY, to the extent permitted by law.
-
-... [snip] ...
-
-gpg: key CEFAF45B marked as ultimately trusted
-public and secret key created and signed.
-
-gpg: checking the trustdb
-gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
-gpg: depth: 0 valid: 4 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 4u
-pub 2048R/CEFAF45B 2013-02-08
- Key fingerprint = 46DF 2951 7E2D 41D7 F7B5 EB16 20C2 1C03 CEFA F45B
-uid Danny Berger (secret-project-backup)
-
-
-So the home page is one of the first welcome pages to new visitors. I wanted to make sure it was warm and welcoming,
-primarily through the central photos we show; the default one being the entry view of our shop (with a dynamic thumbnail
-of our webcam in the bottom right). Over time we'll be able to rotate through different photos for different events,
-product updates, and more clever things.
-
-I wanted to get rid of the multi-color sidebar from every page so it could be better filled with more useful,
-page-specific content. Visually, I increased the page width from 784px to 960px, so combined with dropping the sidebar
-it allows for about 75% more content area.
-
-Previously the sidebar was the main method of navigation, so I regrouped the old blue navigation link box into about 6
-different topics to use as the main header links.
-
-Instead of a simple, almost-non-existant footer on the old site, I took advantage of that area to include store
-information, social links, payment options, and numerous other credentials that customers can appreciate.
-
-
-### Contact Us
-
-
-
-
-Contact information is important for customers. In addition to the information now being in the footer, there is a
-cleaner page with a new interactive map to help people visually realize where exactly the shop is located.
-
-
-### Wonderful Customers
-
-
-
-
-It's always nice to be able to show feedback customers send in. The new site reorganizes everything in a nicer, more
-readable way, and on separate pages. It's also much simpler to submit a testimonial through the on-screen form.
-
-
-### Shop
-
-
-
-
-Generally speaking, I wanted the photos to be the main defining experience that a visitor has. To that end, product
-photos became significantly larger in an effort to fill in the missing colors of the simple color palette I used.
-Since it's the main shop page, I also included useful links like new products, gift certificates, search, and links for
-browsing by some attributes.
-
-
-
-
-Within specific shop categories, I only slightly increased the thumbnails and instead favored focusing more on the
-different brands and their distinctions.
-
-One other significant addition to the new website is the social sharing functionality. On most shop pages, there are new
-social sharing links to Twitter, Pinterest, and Facebook. Using a custom short domain and campaign URL arguments, we can
-get better insight into customer interests.
-
-
-
-
-In my opinion, one of the best changes has been to viewing products on pages like this. Using a sidebar to show the
-description and attributes allows customers to more quickly see the enticing and larger product photos together.
-
-
-
-
-I think the second best improvement is the individual product page where the photo takes precedence and shows off the
-quality of the product. A larger call-to-action makes it easier to add the item to carts and wishlists. I reorganized
-the product information as well to better prioritize it, visually.
-
-
-
-One major feature addition has been a real search engine. The old site used some complex and inefficient database
-queries (which actually caused noticeable performance issues at rare times). With the new site, all the products are
-properly indexed and searched via [elasticsearch][2]. I'm looking forward to adding more elasticsearch integrations on
-the site in the future.
-
-
-### Help
-
-
-
-
-Previously we had a single, text-heavy and difficult to read help page, also known as "frequently asked questions." The
-new site breaks things down into different topics and adds creative pictures to make things more readable. There's also
-a new inline form where customers can ask for help instead of bothering to open an email client and compose an email.
-
-
-## New Stuff
-
-Although I disabled a number of things for later release and chatter, it's always fun to include some completely new
-functionality...
-
-
-### Local
-
-
-
-I created a new topic dedicated to our local customers. Since it's not only an online store anymore, we wanted a way to
-publicize some of the local activities that Fort Collins people would be interested in. It also lets online-only
-customers see how we exist and work in real life to create more of a connection.
-
-
-### About
-
-
-
-Along with a local page, I also wanted a better page for showing our real world existence so customers could feel more
-connected and understand both who and where they're purchasing from.
-
-
-### Shop Attributes
-
-
-
-In an effort to make navigating the shop easier, I created new pages to view products by attributes in a more organized
-way. If somebody is interested in "Fingering Weight" they can easily see all the companies and brands that offer it. If
-they need more complicated searches, there's an Advanced Search link at the bottom of each page.
-
-
-### Site Feedback
-
-
-
-For both the cases of bugs and hearing ideas for improvement, I wanted to be sure visitors could easily send technical
-feedback. Links at the footer of every page include information like what page they were looking at, what browser,
-authenticated username information, and whatever notes they want to add.
-
-
-### humans.txt
-
-
-
-Whenever possible, I like discussing and linking to technical resources that I have found useful. For the nerdy types, I
-created the `humans.txt` file to document many of the resources that have helped make the website possible.
-
-
-## Conclusion
-
-So there's the basic overview about some of the less-technical changes. I'm looking forward to several additional
-features to rollout over time and help keep things fresh over the next few months. Later blog posts can discuss some of
-the more technical processes and decisions that have helped in making the new site.
-
-
- [1]: http://www.theloopyewe.com/
- [2]: http://www.elasticsearch.org/
diff --git a/blog/_posts/2013-05-07-embeddable-and-context-aware-web-pages.md b/blog/_posts/2013-05-07-embeddable-and-context-aware-web-pages.md
deleted file mode 100644
index f1a4966..0000000
--- a/blog/_posts/2013-05-07-embeddable-and-context-aware-web-pages.md
+++ /dev/null
@@ -1,154 +0,0 @@
----
-title: Embeddable and Context-Aware Web Pages
-layout: post
-tags: [ 'architecture', 'http', 'javascript', 'symfony', 'symfony2' ]
-description: Embedding content in an absolutely relative manner.
----
-
-In my [symfony][5] website applications I frequently make multiple subrequests to reuse content from other controllers.
-For simple, non-dynamic content this is trivial, but when arguments can change data or when the browser may want to
-update those subrequests things start to get complicated. Usually it requires tying the logic of the subrequest
-controller in the main request controller (e.g. knowing that the `q` argument needs to be passed to the template, and
-then making sure the template passes it in the subrequest). I wanted to simplify it and get rid of those inner
-dependencies.
-
-As an example, take a look at this [product search][1]. The [facets][2] and [results][3] are actually subrequests, but
-the main results content is taking advantage of the request design I implemented. My goals were:
-
- * remove logic from controller code to keep them independent from each other,
- * pages work without JavaScript and without requiring newer browsers,
- * pages work the same whether it's a subrequest or a master request, and
- * any page should be capable of being a self-contained subrequest.
-
-
-## Steps
-
-When a subrequest is self-contained, I call it a *subcontext*. These subcontext requests have an additional requirement
-of being publicly accessible. In the product search, the [results][3] page is publicly routed and all the pagination and
-view links will work properly within the `./results.html` page. This makes it easy for using XHR to load updated
-content.
-
-Another minor piece of this design is that views don't need to be fully rendered. This means an Ajax request can ask for
-just the page content and exclude the typical header/footer. In [Twig][4] parlance it is a `frag_content` block which
-has all the useful content.
-
-When it comes to passing query parameters down through subcontexts, I decided that each subcontext gets its own scoped
-variable. So whenever I render a subcontext in a template, I always specify a name for it. The name should be unique
-within the template context. In the product search example, the facets subcontext is named `f` and the results
-subcontext is named `r`. When a request arrives for `/?r[offset]=54`, the subrequest will arrive at the results
-controller looking like `/results.html?offset=54` (which is equivalent to navigating that page directly).
-
-To keep track of the subcontext names, template content, query data, and relative locations I started using a custom
-request header named `tle-subcontext`. In practice it looks like:
-
- tle-subcontext: r:content@/shop/search/availability/in-stock/?q=red
-
-When that request header exists it means:
-
- * we're within a subcontext named `r`,
- * we want to get the view fragment named `content`, and
- * the root URL we started at was `/shop/search/availability/in-stock/?q=red`.
-
-Within the controller code that header information should not be relevant. In templating though it becomes useful for
-rewriting URLs. Whenever a template is going to give a link to itself, I wrap it in a custom `subcontext_rewrite`
-function. For example, given the `tle-subcontext` configuration above, it would rewrite:
-
- dataset_generic(...snip...)
- => /shop/.../in-stock/results.html?q=red&view=list-tn&offset=54
-
- subcontext_rewrite(dataset_generic(...snip...))
- => /shop/.../in-stock/?q=red&r[view]=list-tn&r[offset]=54#r
-
-The rewritten URL is completely valid and can be accessed without fancy JavaScript calls. Now, to make that possible I
-don't use the standard inline renderer in Twig. I created a custom renderer with a little additional logic which takes
-care of rewriting the subcontext data and injecting the header:
-
-{% highlight php %}
-$rootUri = $request->getRequestUri();
-
-if (preg_match('/^([a-z0-9\-]+):([a-z0-9]+)@(.*)$/', $request->server->get('HTTP_TLE_SUBCONTEXT'), $match)) {
- # this means a subcontext already exists and a sub-subcontext is being created
-
- # append our context name to the parent context name
- $options['name'] = $match[1] . '-' . $options['name'];
-
- # use the root uri from the header since $request is only a subrequest
- $rootUri = $match[3];
-
- # pull out our context-specific query data from the root uri and update our request
- parse_str(parse_url($match[3], PHP_URL_QUERY), $rootQuery);
- $subRequest->query->replace(isset($rootQuery[$options['name']]) ? $rootQuery[$options['name']] : array());
-} elseif ((null !== $subdata = $request->query->get($options['name'])) && (is_array($subdata))) {
- # pull out our context-specific query data
- $subRequest->query->replace($subdata);
-}
-
-# now add the header with all our combined data to the request
-$subRequest->server->set(
- 'HTTP_TLE_SUBCONTEXT',
- $options['name'] . ':' . (empty($options['frag']) ? 'content' : $options['frag']) . '@' . $rootUri
-);
-
-unset($options['name'], $options['frag']);
-{% endhighlight %}
-
-So now whenever I want a subcontext within a view, I can use the custom renderer:
-
-{% highlight jinja %}{% raw %}
-{{ render_subcontext(path('search_results', passthru), { 'name' : 'r' }) }}
-{% endraw %}{% endhighlight %}
-
-With those simple customizations I no longer have to worry about knowing what parameters need to be passed on to
-template subrequests. It also paves the way for some more fancy behavior...
-
-
-## Adding Some Magic
-
-Since the subcontext pages are publicly accessible, it should be easy to let Ajax reload individual subcontexts without
-having to reload the whole page. To enable that, I went ahead and configured subcontext requests to always end up in a
-specific layout which will wrap it with the subcontext metadata. The template looks like:
-
-{% highlight jinja %}{% raw %}
-
-
-
-The full stack trace is available along with all the local and global variables. In addition to the basic step
-over/into/out, breakpoints can be set throughout the code. When paused, variables can be inspected and explored. In
-addition to simple types like strings and booleans, complex objects and arrays can be expanded and further explored.
-
-
-
-Not only can variables be read, they can also be updated inline by double clicking and entering new values. Or, for more
-advanced commands, the console can be used to evaluate application code, possibly updating the runtime.
-
-
-
-
-Like most other IDE debuggers, the frontend supports jumping through the various levels in the stack to inspect the
-runtime and run arbitrary commands. One other minor feature is watch expressions which are evaulated during every pause.
-
-
-
-
-Once a debug session has completed, the debug tab gets redirected back to the waiting page. Or, if the debug tab gets
-closed in the middle of the debug session, the debugger will detach from the program and let it run to completion.
-
-PHP isn't the only supported language. By using the debugging modules from [Komodo][14], other languages using the DBGp
-communication can also use `ti-debug`. For example, Python scripts can currently be debugged, too...
-
-
-
-
-## Workflow
-
-One of the ways that `ti-debug` can be run is locally for a single developer, but in the case of DBGp, `ti-debug` can
-also act as a proxy to support multiple developers, or a combination of developers wanting to use both the browser-based
-debugger along with their own local IDEs. This way, `ti-debug` could be running on a central development server to allow
-all developers access.
-
-
- [1]: http://www.eclipse.org/
- [2]: http://www.zend.com/products/studio/
- [3]: https://netbeans.org/
- [4]: http://panic.com/coda/
- [5]: http://www.vim.org/
- [6]: http://www.activestate.com/komodo-ide
- [7]: https://github.com/dpb587/ti-debug
- [8]: http://php.net/
- [9]: https://github.com/mrdavidlaing
- [10]: https://github.com/cityindex
- [11]: https://www.google.com/intl/en/chrome/browser/
- [12]: http://nodejs.org/
- [13]: http://www.webkit.org/
- [14]: http://code.activestate.com/komodo/remotedebugging/
diff --git a/blog/_posts/2013-06-01-search-engine-based-on-structured-data.md b/blog/_posts/2013-06-01-search-engine-based-on-structured-data.md
deleted file mode 100644
index 4150bf2..0000000
--- a/blog/_posts/2013-06-01-search-engine-based-on-structured-data.md
+++ /dev/null
@@ -1,315 +0,0 @@
----
-title: The Basics of a Custom Search Engine
-layout: post
-tags: [ 'elasticsearch', 'gearmand', 'schema.org', 'search', 'sitemap', 'structured data' ]
-description: Combining elasticsearch and "structured data" to create a self-hosted search engine.
----
-
-One of the most useful features of a website is the ability to search. [The Loopy Ewe][4] has had some form of faceted
-product search for a long time, but it has never had the ability to quickly find regular pages, categories, brands, blog
-posts and the like. [Google][1] seems to lead in offering custom search products with both [Custom Search Engine][2] and
-[Site Search][3], but they're either branded or cost a bit of money. Instead of investing in their proprietary products,
-I wanted to try to create a simple search engine for our needs which took advantage of my previous work in implementing
-existing open standards.
-
-
-## Introduction
-
-In my mind, there are four basic processes when creating a search engine:
-
-**Discovery** - finding the documents that are worthy of indexing. This step was fairly easy since I had already setup
-a [sitemap][6] for the site. Internally, the feature bundles of the site are responsible for generating their own
-sitemap (e.g. blog posts, regular content pages, photo galleries, products, product groups) and [`sitemap.xml`][10] just
-advertises them. So, for our purposes, the discovery step just involves reviewing those sitemaps to find the links.
-
-**Parsing** - understanding the documents to know what content is significant. Given my previous work of [implementing
-structured data][7] on the site and creating internal tools for reviewing the results, parsing becomes a very simple
-task.
-
-The next two processes are more what I want to focus on here:
-
- * **Indexing** - ensuring the documents are accessible via search queries.
- * **Maintenance** - keeping the documents updated when they are updated or removed.
-
-
-## Indexing
-
-We were already using [elasticsearch][8], so I was hoping to use it for full-text searching as well. I decided to
-maintain two types in the search index.
-
-
-### Discovered Documents (`resource`)
-
-The `resource` type has all our indexed URLs and a cache of their contents. Since we're not going to be searching it
-directly, it's more of a basic key-based storage based on the URL. The mapping looks something like:
-
-{% highlight javascript %}
-{ "_id" : {
- "type" : "string" },
- "url" : {
- "type" : "string",
- "index" : "no" },
- "response_status" : {
- "type" : "string",
- "index" : "no" },
- "response_headers" : {
- "properties" : {
- "key" : {
- "type" : "string",
- "index" : "no" },
- "value" : {
- "type" : "string",
- "index" : "no" } } },
- "response_content" : {
- "type" : "string",
- "index" : "no" },
- "date_retrieved" : {
- "type" : "date",
- "format" : "yyyy-MM-dd HH:mm:ss" },
- "date_expires" : {
- "type" : "date",
- "format" : "yyyy-MM-dd HH:mm:ss" } }
-{% endhighlight %}
-
-The `_id` is simply a hash of the actual URL and used elsewhere. Whenever the discovery process finds a new URL, it
-creates a new record and queues a task to download the document. The initial record looks like:
-
-{% highlight javascript %}
-{
- "_id" : "b48d426138096d66bfaa4ac9dcbc4cb6",
- "url" : "/local/fling/spring-fling-2013/",
- "date_expires" : "2001-01-01 00:00:00"
-}
-{% endhighlight %}
-
-Then the download task is responsible for:
-
- 1. Receiving a URL to download;
- 2. Finding the current `resource` record;
- 3. Validating it against `robots.txt`;
- 4. Sending a new request for the URL (respecting `ETag` and `Last-Modified` headers);
- 5. Updating the `resource` record with the response and new `date_*` values;
- 6. And, if the document has changed, queueing a task to parse the `resource`.
-
-By default, if an `Expires` response header isn't provided, I set the `date_expires` field to several days in the
-future. The field is used to find stale documents later on.
-
-
-### Parsed Documents (`result`)
-
-The `result` type has all our indexed URLs which were parsed and found to be useful. The documents contain some
-structured fields which are generated by the parsing step. The mapping looks like:
-
-{% highlight javascript %}
-{ "_id": {
- "type": "string" },
- "url": {
- "type": "string",
- "index": "no" },
- "itemtype": {
- "type": "string",
- "analyzer": "keyword" },
- "image": {
- "type": "string",
- "index": "no" },
- "title": {
- "boost": 5.0,
- "type": "string",
- "include_in_all": true,
- "position_offset_gap": 64,
- "index_analyzer": "snowballed",
- "search_analyzer": "snowballed_searcher" },
- "keywords": {
- "_boost": 6.0,
- "type": "string",
- "include_in_all": true,
- "index_analyzer": "snowballed",
- "search_analyzer": "snowballed_searcher" },
- "description": {
- "_boost": 3.0,
- "type": "string",
- "analyzer": "standard" },
- "crumbs": {
- "boost": 0.5,
- "properties": {
- "url": {
- "type": "string",
- "index": "no" },
- "title": {
- "type": "string",
- "include_in_all": true,
- "analyzer": "standard" } } },
- "content": {
- "type": "string",
- "include_in_all": true,
- "position_offset_gap": 128,
- "analyzer": "standard" },
- "facts": {
- "type": "object",
- "enabled": false,
- "index": "no" },
- "date_parsed" : {
- "type" : "date",
- "format" : "yyyy-MM-dd HH:mm:ss" }
- "date_published" : {
- "type" : "date",
- "format" : "yyyy-MM-dd HH:mm:ss" } }
-{% endhighlight %}
-
-A few notes on the specific fields:
-
- * `itemtype` - the generic result type in schema.org terms (e.g. Product, WebPage, Organization)
- * `image` - a primary image from the page; it becomes a thumbnail on search results to make them more inviting
- * `title` - usually based on the `title` tag or more-concise `og:title` data
- * `keywords` - usually based on the keywords `meta` tag (the field is boosted because they're specifically targeted
- phrases)
- * `description` - usually the description `meta` tag
- * `content` - any remaining useful, searchable content somebody might try to find something in
- * `facts` - arbitrary data used for rendering more helpful search results; some common keys:
- * `collection` - indicates there are multiple of something (e.g. product quantities, styles of a product)
- * `product_model` - indicate a product model name for the result
- * `brand` - indicate the brand name for the result
- * `price`, `priceMin`, `priceMax` - indicate the price(s) of a result
- * `availability` - for a product this is usually "in stock" or "out of stock"
- * `date_published` - for content such as blog posts or announcements
-
-The `result` type is updated by the parse task which is responsible for:
-
- 1. Receiving a URL to parse;
- 2. Finding the current `resource` record;
- 3. Run the `response_content` through the appropriate structured data parser;
- 4. Extract generic data (e.g. title, keywords);
- 5. Extract `itemtype`-specific metadata, usually for `facts`;
- 6. Update the `result` record.
-
-For example, this parsed [product model][17] looks like:
-
-{% highlight javascript %}
-{ "url" : "/shop/g/yarn/madelinetosh/tosh-dk/",
- "itemtype" : "ProductModel",
- "title" : "Madelinetosh Tosh DK",
- "keywords" : [ "tosh dk", "tosh dk yarn", "madelinetosh", "madelinetosh yarn", "madelinetosh tosh dk", "madelinetosh" ],
- "image" : "/asset/catalog-entry-photo/17c1dc50-37ab-dac6-ca3c-9fd055a5b07f~v2-96x96.jpg",
- "crumbs": [
- {
- "url" : "/shop/",
- "title" : "Shop" },
- {
- "url" : "/shop/g/yarn/",
- "title" : "Yarn" },
- {
- "url" : "/shop/g/yarn/madelinetosh/",
- "title" : "Madelinetosh" } ],
- "content" : "Hand-dyed by the gals at Madelinetosh in Texas, you'll find these colors vibrant and multi-layered. Perfect for thick socks, scarves, shawls, hats, gloves, mitts and sweaters.",
- "facts" : {
- "collection": [
- {
- "value" : 93,
- "label" : "products" } ],
- "brand" : "Madelinetosh",
- "price" : "22.00" },
- "_boost" : 4 }
-{% endhighlight %}
-
-
-### Searching
-
-Once some documents are indexed, I can create simple searches with the [`ruflin/Elastica`][11] library:
-
-{% highlight php %}
-addMust(
- (new \Elastica\Query\Bool())
- ->setParam('minimum_number_should_match', 1)
- ->addShould(
- (new \Elastica\Query\QueryString())
- ->setParam('default_field', 'keywords')
- /* ...snip... */ )
- ->addShould(
- (new \Elastica\Query\QueryString())
- ->setParam('default_field', 'title')
- /* ...snip... */ )
- ->addShould(
- (new \Elastica\Query\QueryString())
- ->setParam('default_field', 'content')
- /* ...snip... */ ) );
-
-/* ...snip... */
-
-$query = new \Elastica\Query($bool);
-{% endhighlight %}
-
-To easily focus specific matches in the `title` and `content` fields I can enable highlighting:
-
-{% highlight php %}
-setHighlight(
- array(
- 'pre_tags' => array(''),
- 'post_tags' => array(''),
- 'fields' => array(
- 'title' => array(
- 'fragment_size' => 256,
- 'number_of_fragments' => 1 ),
- 'content' => array(
- 'fragment_size' => 64,
- 'number_of_fragments' => 3 ) ) ) );
-{% endhighlight %}
-
-
-## Maintenance
-
-A search engine is no good if it's using outdated or no-longer-existant information. To help keep content up to date, I
-take two approaches:
-
-**Time-based updates** - one of the reasons for the indexed `date_expires` field of the `resource` type is so an
-process can go through and identify documents which have not been updated recently. If it sees something is stale, it
-goes ahead and queues it for update.
-
-**Real-time updates** - sometimes things (like product availability) change frequently, impacting the quality of search
-results. Instead of waiting for time-based updates, I use event listeners to trigger re-indexing when it sees things
-inventory changes or product changes in an order.
-
-In either case, when a URL is discovered to be gone, the records from both `resource` and `result` are removed for the
-URL.
-
-
-### Utilities
-
-Sometimes there are deploys where specific pages are definitely changing, or when a whole new sitemap is getting
-registered with new URLs. Instead of waiting for the time-based updates or cron jobs to run, I have these commands
-available for scripting:
-
- * `search:index-rebuild` - re-read the sitemaps and assert the links in the `resource` index
- * `search:index-update` - find all the expired resources and queue them for update
- * `search:result-rerun` - force the download and parsing of a URL
- * `search:sitemap-generate` - regenerate all registered sitemaps
-
-
-## Conclusion
-
-Starting with structured data and elasticsearch makes building a search engine significantly easier. Data and indexing
-makes it faster to show smarter [search results][16]. Existing standards like [OpenSearch][12] make it easy to extend
-the search from a web page into the [browser][15] and even third-party applications via [Atom][13] and [RSS][14] feeds.
-Local, real-time updates ensures search results are timely and useful. Even with the basic parsing and ranking
-algorithms shown here, results are quite accurate. It has been a beneficial experience to approach the website from the
-perspective of a bot, giving me a better appreciation of how to efficiently markup and market content.
-
-
- [1]: http://www.google.com/
- [2]: http://www.google.com/cse/all
- [3]: http://www.google.com/enterprise/search/products_gss_pricing.html
- [4]: http://www.theloopyewe.com/
- [5]: http://schema.org/
- [6]: http://www.sitemaps.org/
- [7]: /blog/2013/05/13/structured-data-with-schema-org.html
- [8]: http://www.elasticsearch.org/
-[10]: http://www.theloopyewe.com/sitemap.xml
-[11]: https://github.com/ruflin/Elastica/
-[12]: http://www.opensearch.org/Home
-[13]: https://www.theloopyewe.com/search/results.atom?q=spring+fling
-[14]: https://www.theloopyewe.com/search/results.rss?q=spring+fling
-[15]: https://www.theloopyewe.com/search/opensearch.xml
-[16]: https://www.theloopyewe.com/search/?q=madelinetosh
-[17]: https://www.theloopyewe.com/shop/g/yarn/madelinetosh/tosh-dk/
diff --git a/blog/_posts/2014-01-13-barcoding-inventory-with-qr-codes.md b/blog/_posts/2014-01-13-barcoding-inventory-with-qr-codes.md
deleted file mode 100644
index db65ad0..0000000
--- a/blog/_posts/2014-01-13-barcoding-inventory-with-qr-codes.md
+++ /dev/null
@@ -1,179 +0,0 @@
----
-title: "Barcoding Inventory with QR Codes"
-layout: post
-tags: [ 'barcode', 'qr', 'retail', 'product', 'label', 'scan' ]
-description: A web-centric, user-friendly approach for using barcodes in a retail shop.
----
-
-Most decently-sized stores will have barcodes on their products. For the store, it makes the checkout process extremely
-easy and accurate. For the consumer, barcodes might be useful with a phone app to scan them. I needed to make the
-inventory scannable at the [shop][1], and I really wanted to do it in a more meaningful way than 1D barcodes could
-support.
-
-
-## Barcodes: 1D vs 2D
-
-There are two different kinds of barcodes: 1 dimensional and 2 dimensional. The 1D allows for a purely linear scan of
-simple, [UPC][2]-like barcodes. While 1D barcodes are extremely commonplace on many products, I dislike them because
-they can't provide any context.
-
-For example, if I were shopping in [Target][3] and scanned a UPC barcode with a regular phone app, it might take me to
-the [Amazon][4] listing first - not necessarily great for Target's business, but it also becomes a completely separate
-brand channel distracting my thoughts. Another example is when UPCs aren't registered on a product - different retail
-stores will make up their own internal barcode which isn't helpful at all if I try to scan it.
-
-On the otherhand, 2D barcodes require complex parsing but they can hold much more data. [QR codes][5] are one extremely
-common form of 2D barcodes and they typically encode URLs. With my goal of providing more context, URLs provide just
-that - not only with a domain name, but an arbitrary path. If somebody scanned an item at our shop, they'd at least get
-redirected through the shop's website.
-
-One disadvantage that QR codes have compared to 1D barcodes is their size and resolution requirements. All 1D barcodes
-could theoretically be 1 pixel high, but QR codes must be square. To help ensure a reasonable QR codes, most people
-will use a URL shortener service - shorter URLs mean simpler QR designs, simpler designs mean the QR code can be read
-more easily and doesn't need to be large.
-
-Another disadvantage to QR codes is that 2D handheld scanners are significantly more expensive than 1D. Fortunately,
-many previously-used 2D scanners can be found on [eBay][6] for very reasonable prices. Unfortunately, I found that
-some of the used ones would quickly turn unreliable after a period of time.
-
-
-## Mapping URLs to retail "things"
-
-While inventory was the primary target of barcoding, I really wanted to barcode most things involved with retail
-workflows (like order receipts). With that in mind I figured I needed to store three properties:
-
- * `insignia` - the unique, short identifier (e.g. `EyV3chYax`)
- * `target_ref` - the type of "thing" (e.g. `inventory` or `order`)
- * `target_id` - the ID of the "thing" (e.g. `010035EA-9F6D-41A2-97C4-EEB5A3F3034A`)
-
-I created a manager which supports three basic operations (internally it uses a map of the different types of
-"things"):
-
- * `getInsignia($target)` - which returns the short identifier/insignia
- * `getTarget($insignia)` - which returns the application object
- * `getResponse($insignia)` - which returns an appropriate HTTP response
-
-I created a couple of HTTP endpoints which utilize the manager:
-
- * `/io/{insignia}` - which returns the result of `getResponse` (typically a redirect)
- * `/io/{insignia}.png` - which returns the QR code image
-
-Then, whenever I want to print a QR code on a document, I just have to do:
-
- -- - -## Conclusion - -I feel like the shop is able to better grow both technically and logistically by having used QR codes as opposed to a -classic barcode system. A few techy customers have tried the QR codes, but it's not really something we've been -promoting. Once the website has a proper mobile-friendly version we'll have a better opportunity and reason to try and -impress customers with the QR codes. In the meantime, the QR codes have been an immense time-saver for both staff and -shoppers checking out at the shop. - - - [1]: http://www.theloopyewe.com/ - [2]: http://en.wikipedia.org/wiki/Universal_Product_Code - [3]: http://www.target.com/ - [4]: http://www.amazon.com/ - [5]: http://en.wikipedia.org/wiki/QR_code - [6]: http://www.ebay.com/ diff --git a/blog/_posts/2014-02-28-distributed-docker-containers.md b/blog/_posts/2014-02-28-distributed-docker-containers.md deleted file mode 100644 index d37489a..0000000 --- a/blog/_posts/2014-02-28-distributed-docker-containers.md +++ /dev/null @@ -1,240 +0,0 @@ ---- -title: Distributed Docker Containers -layout: post -tags: [ 'aws-ec2', 'docker', 'nodejs', 'scs-utils' ] -description: A strategy for integrating Docker services across multiple hosts and data centers. ---- - -One thing I've been working with lately is [Docker][1]. You've probably seen it referenced in various tech articles -lately as the next greatest thing for cloud computing. Docker runs "containers" from base "images" which essentially -allow running many lightweight virtual machines on any recent, Linux-based system. Internally, the magic behind it is -[lxc][2], although Docker adds a lot more magic to improve and make it more usable. - -For a long time now I've used virtual machines for development - it allows me to better simulate how software runs out -on production servers. Historically, [Vagrant][3] + [VirtualBox][4]/[VMWare Fusion][5]/[EC2][6] have been great tools -for that, but they have limitations and they tend to drift a bit from production architecture. - - -## The Problem - -In trying to duplicate the production environments, it's not typically feasible for me to run more than one virtual -machine on my laptop. I could split my single local virtual machine to multiple EC2 instances; but then it becomes more -difficult to manage IP addresses for the various service dependencies as the instances get stopped/started between -working sessions (in addition to the extra costs). VPCs with private IP addresses do help with that a lot, as long as -there's a sane way to manage those resources. - -Another issue that comes up when combining services on a single host is dependency overlap. One example of this is -shared modules. Some newer features of nginx require a newer version of the openssl libraries. However, PHP doesn't -necessarily support the newer version of openssl without upgrading quite a few other components. While there may be -workarounds, the inconvenience of it all typically just prompts me to avoid working on that particular feature, -unfortunately. - -Ultimately, I want to have the same software and network stack that I use in a production environment, but in a -development environment and, if possible, locally on my laptop. - - -## The Alternatives - -This problem is certainly not unique, but a practical solution has been difficult for me to find. I've been -experimenting with a few different technologies over the years trying to solve this sort of thing. - -Vagrant is obviously the first practical solution. For me, it has been a functional solution for quite a while, but not -an optimal one. Like I mentioned before, it's a bit bulky when attempting to mimic non-trivial architectures on a -standard laptop. For a while now, I've been finding the motivation and time to migrate to a better setup. - -With the advent of Docker, many of my software requirements become much simpler. Each piece of software can run in its -own container and I don't have to worry about dependency overlap. Multiple containers are *significantly* cheaper than -trying to run multiple virtual machines. I could even reuse containers built on my development machine out on -production. One thing Docker doesn't effectively solve is service dependency. It can support them on a single host with -links, but not across multiple hosts. - -I've been keeping an eye out for other tools which may help solve these problems. Some of them are: - - * [decking][7] - seems to primarily build on top of Docker's built-in link functionality for service dependency within - a single host - * [etcd][11] - an excellent distributed, hierarchical key-value store; very useful for monitoring configuration values - and being notified when they change (related: [confd][22]) - * [fig][8] - seems like [Foreman][21], but geared for Docker containers - * [flynn][11] - originally I was very excited about this, however it still seems underdeveloped for the purposes of - service discovery of arbitrary services; I'm still very hopeful - * [serf][9] - a very new client for distributing data across a cluster and taking action on it. To me it seems like - more of a management tool (like half of the [mcollective][10] utility) - -Recently, I've been becoming more aquainted with [bosh][12], an interesting tool for managing large deployments along -with all their dependencies. To me, bosh always seems overly complicated for whatever I'd want to accomplish and has -quite a few bosh-specific practices to learn. Its resource and service management is very thorough, although it takes a -while to get comfortable with it. It seems more like an infrastructure management tool rather than a service management -tool, and I was hoping to keep those responsibilities separate and simpler. Ultimately, I think bosh could be made to -work... but I was still hoping for something different, lighter, and utilizing more common open source tools that I was -already familiar with. - - -## The Ideas - -I had a simple application in mind to roughly define my "[minimum viable product][13]": - - 0. run WordPress web application, a MySQL server, and a backup MySQL server as separate services - 0. runtime parity (between development and production) - 1. configure services the exact same way - 1. run services the exact same way - 1. depend on other services the exact same way - 0. architecture flexibility - 1. in production, run the services on three separate hosts across two separate data centers - 1. in development, run all services on a single virtual machine on my laptop - 0. service flexibility - be able to dynamically relocate services without manual reconfiguration and minimal downtime - * combine services into one or two hosts during quiet hours - * move a service to a more powerful instance during high load - 0. self-provisioning - when a container requires a particular volume or network, make sure it can be automatically - provisioned and de-provisioned - -First off, I knew I wanted to run the services inside of Docker containers. I can only imagine Docker's ubiquity will -continue to grow, and the ability to run completely arbitrary software anywhere with minimal host dependencies seemed -like a perfect, lightweight solution. - -I've used [Puppet][14] to configure servers and applications for a long time. While I dislike the overhead it requires -for smaller use cases, I really like the consistency and declarative nature that it provides. Since I'll continue to use -it for host server configuration, it's a small stretch to also use it for configuring the service runtimes. - -When it comes down to it, I think there are two main questions that a service must answer: - - * How should I work? and - * How do I connect with the rest of the world? - -The first question can be managed and configured via Puppet. Once a service is configured and compiled to run as -requested, it never needs to go through that process again. This approach lets compiled Docker images be consistently -reused across time and servers. - -The second question deals with pointing WordPress to the MySQL server, or pointing MySQL server to the data directory, -or running the MySQL backup server on a specific network segment. These decisions and connections have nothing to do -with how the service should work, so they can be changed as needed. So far, I have four main dependencies about how -these containers get connected: - - 0. volumes - giving containers a place to write persistent data (e.g. WordPress `wp-content/uploads` directory) - 0. provided services - a service that the container is running (e.g. `http` on `80/tcp`) - 0. required services - a service that the container needs (e.g. `mysql`) - 0. network - how the container is attached to the network - -I think these basic aspects effectively describe everything needed to manage a self-contained service. - - -## The Implementation - -The next step of an idea is to prototype it, and that's where I am today. There are several pieces that I've been -working on, but three general topics... - - -## Service Discovery - -One of the most interesting concepts is service discovery. I wanted containers to be able to connect with each other -across multiple hosts and data centers. I've been using DNS for host discovery and, while it works great it doesn't seem -entirely appropriate for "containerized" discovery. Through [`A`][23] records, DNS easily picks up on hosts changing, -but is not so good for dynamic ports. DNS [`SRV`][24] records seem *much* more appropriate with attributes for both -hostname and port, but `SRV` records are rarely used in internal APIs. - -Originally I was using etcd to register and discover services, but I found it to be inefficient for filtering services -and propagating changes. Instead, I created a specialized client/server protocol to handle the registration and -discovery process. In technical terms, the protocol works like the following... - -WordPress needs a database, so before it starts the container, it connects with the disco server: - - > **container**: Hi, I need a `mysql` service to talk to - who's available? - > **disco**: You should talk with `192.0.2.11:39313` - I'll keep you posted if it changes, but let me know if you no - > longer need it - -The results are injected as environment variables when the container is started and can use them however it likes. -WordPress obviously runs a web server, so, once the container is started, the container manager connects with disco: - - > **container**: Hi, I'm `wordpress` and I have an `http` service available at `192.0.2.12` on port `39212` - > **disco**: Nice to meet you; let me know if you no longer provide it - -Then things are running happily and you could ask the disco server where to find `wordpress/http` to pull it up in your -web browser. If the database server crashes and recovers elsewhere, a few things will happen. First, when disco realizes -MySQL is no longer available (either by a clean disconnect, heartbeat timeout, or socket disconnect), it notifies -everyone who is subscribed that the endpoint has been dropped: - - > **disco**: Looks like you were using `mysql`, but I'm sorry to tell you it's no longer available - > **container**: Thanks for letting me know - -The container manager then attaches to the container to run an update command letting it know about the change. The -command can take care of updating the runtime configuration and restarting the application server. - -Eventually the new MySQL server will come back online and register itself. Once registered, disco realizes that -WordPress is subscribed, so it lets it know: - - > **disco**: Great news, I have a new `mysql` endpoint for you at `192.0.2.14.39414` - > **container**: Excellent, thanks - -And it again runs the live update command, updating the environment and restarting the application server. - -The disco protocol has a few more features (like using a single server for more than one WordPress/MySQL setup, or -filtering services by arbitrary tags like availability zones to improve load balancing), but that's the general idea. - - -## Configuration Files - -I'm using YAML files to describe images and containers. They get compiled to a static version, and then cached based on -the image configuration. For example, take a look at this example [scs-wordpress][16] image manifest. It describes the -various connection points, docker details, and how it's configured. Now, take a look at the [Puppet manifests][17] which -enumerates all the configuration options which affect how the service will run. Finally, take a look at the -[sample config][18] which ties together what kind of image it needs to be able to run (configuration) and how that image -will be connected to the world. - - -## Self-Provisioning - -For each of the four dependency/connection types (volumes, service provider, service dependent, network), I'm trying to -make them suitable for local development and AWS EC2 deployment. For example: - - * AWS EC2 volumes can be auto-created, mounted, and attached to hosts for use by docker containers. This allows - services to drift across instances - * Likewise, I can also just use a local path for a volume and avoid an official network mount - * Various other strategies can be added for each dependency: - * nfs-volume: to attach a docker mount point to an external NFS mount - * aws-ec2-eni: to attach an ENI as the network interface for a docker container - -My goal is to provide a manifest configuration file to a machine and know that it will load up whatever it needs to run, -including recompiling the image from scratch if it's not available in any caches. - - -## The Prototype - -So, all those ideas are currently under development in my [`scs-utils`][20] repository. I've created a repository called -[`scs-example-blog`][19] which is a functional implementation of my original MVP. It provides a `Vagrantfile` for you to -easily try it out yourself and it goes through the process of getting the containers running on a single virtual machine, -accessing the services from the host, and then splitting them up across multiple virtual machines. It's more a tutorial -describing the steps - typically the service deployment would be managed by Puppet. - - -## The Conclusion - -All these ideas are absolutely a work in progress and I'm still actively tweaking the implementation, but it was in a -functional state to briefly discuss the idea. So far it has been an excellent learning opportunity for Docker, custom -network protocols, and splitting some of the services I've previously been running into more reusable components. Even -if `scs-utils` isn't still what I'm using in 2 years, the refactoring it has motivated makes it significantly easier to -port into whatever more valuable tool surfaces further down the road. - - - [1]: https://www.docker.io/ - [2]: http://linuxcontainers.org/ - [3]: http://www.vagrantup.com/ - [4]: https://www.virtualbox.org/ - [5]: http://www.vmware.com/products/fusion - [6]: http://aws.amazon.com/ec2/ - [7]: http://decking.io/ - [8]: http://orchardup.github.io/fig/ - [9]: http://www.serfdom.io/ - [10]: http://puppetlabs.com/mcollective - [11]: https://flynn.io/ - [12]: http://docs.cloudfoundry.org/bosh/ - [13]: http://en.wikipedia.org/wiki/Minimum_viable_product - [14]: http://puppetlabs.com/puppet/puppet-open-source - [15]: https://github.com/coreos/etcd - [16]: https://github.com/dpb587/scs-wordpress/blob/3ba391d4f82da5c9642d88962e0bce32eb692add/scs/image.yaml - [17]: https://github.com/dpb587/scs-wordpress/tree/3ba391d4f82da5c9642d88962e0bce32eb692add/scs/puppet/scs/manifests - [18]: https://github.com/dpb587/scs-example-blog/blob/master/wordpress/manifest.yaml - [19]: https://github.com/dpb587/scs-example-blog - [20]: https://github.com/dpb587/scs-utils - [21]: http://ddollar.github.io/foreman/ - [22]: https://github.com/kelseyhightower/confd - [23]: http://en.wikipedia.org/wiki/A_record#A - [24]: http://en.wikipedia.org/wiki/SRV_record diff --git a/blog/_posts/2014-04-08-photo-galleries-for-jekyll.md b/blog/_posts/2014-04-08-photo-galleries-for-jekyll.md deleted file mode 100644 index c88ac9c..0000000 --- a/blog/_posts/2014-04-08-photo-galleries-for-jekyll.md +++ /dev/null @@ -1,180 +0,0 @@ ---- -title: Photo Galleries for Jekyll -layout: post -tags: [ 'blog', 'gallery', 'iphoto', 'jekyll', 'jekyllrb', 'photo', 'ruby' ] -description: Easily exporting my iPhoto album to this Jekyll-based site. ---- - -I had a trip to London and Iceland several weeks ago, and I wanted to share some of those photos with people. In the -past I've put those sorts of photo galleries on Facebook, but some friends don't have accounts there and I figured I -could/should just keep my photos with my other personal stuff here. - -Unlike [WordPress][1], [Jekyll][2] doesn't really have a concept of photo galleries, and since Jekyll is a static site -generator it makes things a little more difficult. I looked through [several][3] [other][4] [posts][5] discussing Jekyll -photo galleries, but they all seemed a bit more primitive than what I wanted. I wanted to: - - * stick with existing Jekyll paradigms (e.g. [markdown][8] file to static page), - * retain metadata about my photos (e.g. location data, camera EXIF data), - * support multiple views about my galleries (e.g. photo list, map, slideshow), - * ensure photos can have landing pages and be easily navigated, and - * avoid committing images to my git repository. - -After giving it some thought, I realized this was going to be a multi-step process. - - 0. Script the process of exporting my existing photos to Jekyll-friendly structures. - 0. Find a Jekyll/[Liquid][7] plugin to enumerate directories/files and use the results. - 0. Create templates and pages for my gallery and its photos. - 0. Publish the site! - - -## Step 1: Export existing photo galleries (iPhoto) - -I take pretty much all my photos with my phone and those photos then get synced up with iPhoto. At the end of my trip, I -browse through the photos and create an album of interesting ones. Normally I don't go through and give every photo a -title and description, but if I'm planning on sharing them I add brief notes within iPhoto. - -I knew my iPhoto metadata was stored in `AlbumData.xml`, but I've always had poor performance with massive XML data -files. I decided to start with a different approach: [AppleScript][9]. The following snippet gets me the file paths of -all the photos (in order) from whatever album I ask for: - -{% highlight applescript %}{% raw %} -on run argv - set output to "" - - tell application "iPhoto" - set vAlbum to first item of (get every album whose name is (item 1 of argv)) - set vPhotos to get every photo in vAlbum - - repeat with vPhoto in vPhotos - set output to output & original path of vPhoto & " -" - end repeat - end tell - - return output -end run -{% endraw %}{% endhighlight %} - -So, to get the photos in my album named "London-Iceland Trip" I can do: - -{% highlight console %}{% raw %} -$ osascript export-iphoto-album.applescript 'London-Iceland Trip' -~/Pictures/iPhoto Library.photolibrary/Masters/2014/03/13/20140313-154842/IMG_0303.JPG -~/Pictures/iPhoto Library.photolibrary/Masters/2014/03/13/20140313-154842/IMG_0308.JPG -...snip... -{% endraw %}{% endhighlight %} - -With some tweaks I can get more than just the path to a photo: - -{% highlight console %}{% raw %} -$ osascript export-iphoto-album.applescript 'London-Iceland Trip' -altitude: 16 -latitude: 51.50038 -longitude: -0.12786667 -name: A Classic View -date: Thursday, March 6, 2014 at 4:44:12 PM -path: ~/Pictures/iPhoto Library.photolibrary/Masters/2014/03/13/20140313-154842/IMG_0303.JPG -title: A Classic View ------- -QCon was held at The Queen Elizabeth II Conference Centre and this was the view out one of the common areas. ------------- -...snip... -{% endraw %}{% endhighlight %} - -The next piece is to write something which will clean up the output, resize the photos, and write out all the different -Jekyll files. For that I created a [PHP][10] script since it was going to be easiest for me. Once complete, I then just -pipe the export results to the script and specify the image sizes I want: - -{% highlight console %}{% raw %} -$ osascript ../jekyll-gallery/export-iphoto.applescript 'London-Iceland Trip' | \ - php ../jekyll-gallery/convert.php 2014-london-iceland-trip \ - --export 96x96 --export 200x200 --export 640 --export 1280 -df5150c-a-classic-view...96x96...200x200...640...1280...mdown...done -7cf02b5-night...96x96...200x200...640...1280...mdown...done -...snip... -{% endraw %}{% endhighlight %} - -Once complete, all the resized images are in `asset/gallery/2014-london-iceland-trip` and my markdown files with the -photo details are in `gallery/2014-london-iceland-trip` and they're easily [readable][15]. - - -## Step 2: Jekyll plugin - -At a minimum, I wanted to have a listing of all the photos in a gallery index page. After some searches, I found -[two][11] [scripts][12] which became the inspiration for my final plugin. My [final plugin][16] looks like: - - Tag: - loopdir - Attributes: - match: a pattern to match files within the path (e.g. "*.md") - parse: whether to load the file and parse for YAML front matter - path: a directory, relative to the site root, to find files - sort: a property to search by (e.g. "path") - Result: - An "item" object is exposed to the template with a "page"-like structure. - If parsing is enabled, the YAML properties are available as "item.title". - -Which means I can easily compose a simple photo list with: - -{% highlight jinja %}{% raw %} -{% loopdir path:"gallery/2014-london-iceland-trip" match:"*.md" sort:"ordering" %} - -- - Click to Scan - -- - -
-
-{% endloopdir %}
-{% endraw %}{% endhighlight %}
-
-I reuse this plugin elsewhere for regular directory listings.
-
-
-## Step 3: Create templates
-
-I've started out with two reusable templates in my `_includes` directory:
-
- 0. [Gallery List][13] - a simple listing of thumbnails from all the photos in the gallery
- 0. [Interactive Map][14] - an interactive map showing where all the photos were taken
-
-I can pass arguments (like the gallery name) to the include which makes it easy to embed a gallery in any page:
-
-{% highlight jinja %}{% raw %}
-{% include gallery_list.html gallery='2014-london-iceland-trip' %}
-{% endraw %}{% endhighlight %}
-
-
-## Step 4: Publish
-
-After generating everything locally, I just have to do a couple steps:
-
- 0. Commit all the new `gallery/2014-london-iceland-trip` files (and new templates)
- 0. Run `_build/aws/publish-asset.sh $AWS_S3CMD_CONFIG gallery/2014-london-iceland-trip` to upload all the exported JPGs
- 0. Run `_build/aws/build.sh _build/aws/publish.sh $AWS_S3CMD_CONFIG` to upload any modifications from the rest of the
- site
-
-To make things easier for myself and, possibly, others I put the conversion scripts in my [jekyll-gallery][17] repo.
-
-Now I'm able to refer people to the [gallery](/gallery/2014-london-iceland-trip/) or embed the gallery somewhere
-useful...
-
-
-
-
-## Color Quantification
-
-One of the most difficult processes of supporting color searches is to figure out the colors in products. In our case,
-where we had thousands of items to "colorize", it would be easier to create an algorithm than have somebody manually
-pick out significant colors. When it comes to algorithms and research, the process is called [color quantization][8].
-A lot of the inventory at the shop is yarn and, unfortunately, the tools I tried didn't do a good job at picking out the
-fiber colors (they would find significance in the numerous shadows or average colors).
-
-Ultimately I ended up creating my own algorithm based on several strategies. In addition to finding the significant
-colors it also keeps track of their ratios making it easy to realize multi-color items vs items with accent colors.
-After batch processing inventory to bring colors up to date, I added hooks to ensure new images are processed for colors
-as they're uploaded.
-
-
-
-
-You can see it noticed the significant colors of the yarn and fabric above, along with their approximate ratios. With
-some types of items, it may be possible to infer additional meaning such as the "background color" of fabric.
-
-
-## Color Theory
-
-When it comes to color, there are a few standard methods for measuring it. Probably the most familiar one from a web
-perspective is [RGB][6]. Unfortunately, RGB doesn't efficiently quantify the "color" or hue. For example,
-244, 40, 5 and
-244, 214, 214
-are both obviously reddish, but the second color has high green and blue values yet the blue and green colors are not
-present.
-
-A much better model for this is [HSV][7] (or HSL). The "color" (hue, `H`) cycles from 0 thru 360 where 0 and 360 are
-both red. The `S` for "saturation" ranges from 0 to 100 and describes how much "color" there is. Finally, the `V` for
-"value" (or `B` for "brightness") ranges from 0 to 100 and describes how bright or dark it is. Compare the following
-examples for a better idea:
-
- * 0, 70, 70
- * 0, 30, 70
- * 0, 70, 30
- * 0, 30, 30
- * 180, 70, 70
- * 180, 30, 70
- * 180, 70, 30
- * 180, 30, 30
-
-Within elasticsearch we can easily map an object with the three color properties as integers:
-
-{% highlight javascript %}
-{ "color" : {
- "properties" : {
- "h" : {
- "type" : "integer" },
- "s" : {
- "type" : "integer" },
- "v" : {
- "type" : "integer" } } } }
-{% endhighlight %}
-
-
-## Mappings
-
-Elasticsearch will natively handle [arrays][5] of multiple colors, but `color` needs to become a [`nested`][4] mapping
-type in order to support realistic searches. For example, we could write a query looking for a dark blue, but unless
-it's a nested object the query could match items which have any sort of blue (`color.h = 240`) and any sort of dark
-(`color.v < 50`). To make `color` nested, we just have to add `type = nested`. Then we're able to write `nested` filters
-which will look like:
-
-{% highlight javascript %}
-{ "nested" : {
- "path" : "color",
- "filter" : {
- "bool" : {
- "must" : [
- {
- "term" : {
- "color.h" : 240 } },
- {
- "range" : {
- "color.v" : {
- "lt" : 50 } } } ] } } } }
-{% endhighlight %}
-
-With the extra color proportion value mentioned earlier, we're also able to add a `ratio` range alongside `h`, `s`, and
-`v`. This will allow us to find items where blue is more of a dominant color (e.g. more than 80%). Another searchable
-fact which may be useful is `color_count` - then we would be able to find all solid-color products, or all dual-color
-products, or just any products with more than four significant colors.
-
-While working on a frontend interface, I was having trouble faceting popular colors. A lot of dull colors were coming
-back. As a first step, I started using some [`terms`][9] aggregations with a `value_script` which created large buckets
-of colors from the `h`, `s`, and `v` tuple. That helped significantly, but then it seemed like there was a
-disproportionate number of very dark and very light colors. Instead of adding additional calculations to the aggregation
-during runtime, I decided to pre-compute the buckets that the colors should belong to. Now it's doing more advanced
-calculations and no runtime calculations. For example, all low-`v` colors will end up in a single bucket
-`{ h : 360 , s : 10 , v : 10 , ... }`. Similar rules trim low-saturation colors and create the appropriate buckets for
-colors.
-
-
-## Searches
-
-Given four key properties (hue, saturation, value, and color ratio), I needed a way to represent the searches from
-users. For searching individual colors, I settled on the following syntax:
-
- {ratio-min}-{ratio-max}~{hue}-{sat}-{val}~{hue-range}-{sat-range}-{val-range}
-
-This way, if a user is very specific about the dark blue they want, and they want at least 80% of the item to be blue,
-the color slug might look like: [`80-100~190-100-50~10-5-5`][10]. Within the application, this gets translated into a
-[`filtered`][11] query. The filter part looks like:
-
-{% highlight javascript %}
-{ "filter": {
- "and": [
- { "nested": {
- "path": "color",
- "filter": {
- "and": [
- { "range": {
- "ratio": {
- "gte": 80,
- "lte": 100 } } },
- { "range": {
- "h": {
- "gte": 180,
- "lte": 200 } } },
- { "range": {
- "s": {
- "gte": 95,
- "lte": 100 } } },
- { "range": {
- "v": {
- "gte": 45,
- "lte": 55 } } } ] } } } ] } }
-{% endhighlight %}
-
-The query part then becomes responsible for ranking using a basic calculation which roughly computes the distance
-between the requested color and the matched color. The [`function_score`][13] query currently looks like:
-
-{% highlight javascript %}
-{ "function_score": {
- "boost_mode": "replace",
- "query": {
- "nested": {
- "path": "color",
- "query": {
- "function_score": {
- "score_mode": "multiply",
- "functions": [
- { "exp": {
- "h": {
- "origin": 190,
- "offset": 2,
- "scale": 4 } } },
- { "exp": {
- "s": {
- "origin": 100,
- "offset": 4,
- "scale": 8 } } },
- { "exp": {
- "v": {
- "origin": 50,
- "offset": 4,
- "scale": 8 } } },
- { "linear": {
- "ratio": {
- "origin": 100,
- "offset": 5,
- "scale": 10 } } } ] } },
- "score_mode": "sum" } },
- "functions": [
- { "script_score": {
- "script": "_score" } } ] } }
-{% endhighlight %}
-
-The `_score` can then be used in sorting to show the closest color matches first.
-
-
-
-
-Of course, these color searches can be added alongside the other facet searches like product availability, attributes,
-and regular keyword searches.
-
-
-## User Interface
-
-One of the more difficult tasks of the color search was to create a reasonable user interface to front the powerful
-capabilities. This initial version uses the same interface as a year ago, letting users pick from the available "color
-dots". Ultimately I hope to improve it with a more advanced, yet simple, [Raphaël][12] interface which would let them
-pick a specific color and say how picky they want to be. That goal requires a fair bit of time and learning though...
-
-
-## Summary
-
-I'm excited to have the search by color functionality back. I'm even more excited about the possibilities of better,
-advanced user searches further down the road. After it gets used a bit more, I hope we can more prominently promote the
-color search functionality around the site. Elasticsearch has been an excellent tool for our product searching and it's
-exciting to continue expanding the role it takes in powering the website.
-
-
- [1]: /blog/2013/04/27/new-website-for-the-loopy-ewe.html
- [2]: http://www.theloopyewe.com/
- [3]: http://www.elasticsearch.org/
- [4]: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/mapping-nested-type.html
- [5]: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/mapping-array-type.html
- [6]: http://en.wikipedia.org/wiki/RGB_color_model
- [7]: http://en.wikipedia.org/wiki/HSL_and_HSV
- [8]: http://en.wikipedia.org/wiki/Color_quantization
- [9]: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-aggregations-bucket-terms-aggregation.html
- [10]: http://www.theloopyewe.com/shop/search/cd/80-100~190-100-50~10-5-5/g/59A9BAC5/
- [11]: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/query-dsl-filtered-query.html#query-dsl-filtered-query
- [12]: http://raphaeljs.com/
- [13]: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/query-dsl-function-score-query.html
diff --git a/blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md b/blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md
deleted file mode 100644
index 3a56c43..0000000
--- a/blog/_posts/2014-09-17-simplifying-my-bosh-related-workflows.md
+++ /dev/null
@@ -1,737 +0,0 @@
----
-title: "Simplifying My BOSH-related Workflows"
-layout: "post"
-tags: [ "aws", "bosh", "cloudformation", "cloudfoundry", "cloque", "docker", "ec2", "packaging", "snapshots", "twig" ]
-description: "Discussing some commands and wrappers I've been adding on top of BOSH."
----
-
-Over the last nine months I've been getting into [BOSH][1] quite a bit. Historically, I've been [reluctant][20] to
-invest in BOSH because I don't entirely agree with its architecture and steep learning curve. BOSH
-[describes itself][1] with...
-
- > BOSH installs and updates software packages on large numbers of VMs over many IaaS providers with the absolute
- > minimum of configuration changes.
- >
- > BOSH orchestrates initial deployments and ongoing updates that are:
- >
- > * Predictable, repeatable, and reliable
- > * Self-healing
- > * Infrastructure-agnostic
-
-With continued use and experience necessitated from the [logsearch][2] project, I saw ways it would solve more critical
-problems for me than it would create. For that reason, I started experimenting and migrating some services over to
-BOSH to better evaluate it for my own uses. To help bridge the gap between BOSH inconveniences and some of my
-architectural/practical differences I've been making a tool called [`cloque`][3].
-
-You might find the ideas more useful rather than the `cloque` code itself - it is, after all, experimental and written
-in PHP (since that's why I'm most productive in) whereas `bosh` is more Ruby/Go-oriented.
-
-
-## Infrastructure First
-
-Generally speaking, BOSH needs some help with infrastructure (i.e. it can't create its own VPC, network routing tables,
-etc). Additionally, sometimes deployments don't even need the BOSH overhead. Within `cloque`, I've split management
-tasks into two components:
-
- * Infrastructure - this is more of the "physical" layer defining the networking layer, some independent services (e.g.
- NAT gateways, VPN servers), security groups, and other core or non-BOSH functionality.
- * BOSH - everything related to BOSH (e.g. director, deployment, snapshots, releases, stemcells) which is deployed onto
- the infrastructure somewhere.
-
-Since BOSH depends on some infrastructure, we'll get started with that first. One key to a `cloque`-managed environment
-is that each environment has its own directory which includes a `network.yml` in the top-level. The network may be
-located in a single datacenter, or it could span multiple countries. The file defines all the basics about the network
-including subnets, reserved IPs, basic cloud properties, and some logical names.
-
-I've committed an example network to the [`share`][7] directory within `cloque` and will use that in the examples here.
-To get started, we'll copy the example and work with it...
-
- # copy the sample environment
- $ cp -r ~/cloque/share/example-multi ~/cloque-acme-dev
- $ cd ~/cloque-acme-dev
-
- # this will help the command know where to look for configs later
- $ export CLOQUE_BASEDIR="$PWD"
-
-If you take a look at the sample [`network.yml`][18], you'll see a couple regions with their individual network
-segments, VPN networks, and a few reserved IP addresses which can be referenced elsewhere. Once `network.yml` is
-created, the `utility:initialize-network` task can take care of bootstrapping the following:
-
- * create stub folders for your different regions; e.g. `aws-apne1/core`, `global/private`)
- * create a new SSH key (in `global/private/cloque-{yyyymmdd}*.pem`) and upload it to the AWS regions being used
- * create a new IAM user, access key, and EC2 policy for BOSH to use
- * create a certificate authority for [OpenVPN][8] usage
- * create both client/server certificates for the inter-region VPN connections (requires interactive prompts for
- passwords/confirmations)
- * create an S3 bucket for shared configuration storage
-
-When run, it assumes AWS credentials can be discovered from the environment...
-
- $ cloque utility:initialize-network
- > local:fs/global -> created
- ...snip...
-
- > I created `utility:initiailize-network` because I found myself reusing keys and buckets across multiple environments
- > (such as development vs production) because they were annoying to manage by hand. I wanted to make security easier
- > for myself and, in the process, simplify the processes through automation.
-
-The top-level `global` directory is intended for configuration which applies to all areas. With the example I use it to
-create an additional IAM role which allows VPN gateways to securely download their VPN keys and configuration files...
-
- $ ( cd global/core && cloque infra:put --aws-cloudformation 'Capabilities=["CAPABILITY_IAM"]' )
- > validating...done
- > checking...missing
- > deploying...done
- > waiting...CREATE_IN_PROGRESS...........................CREATE_COMPLETE...done
-
-The `infra:put` is the core command responsible for managing the low-level, infrastructure-related resources. The
-command looks for an `infrastructure.json` file (see the [example][27]) and since I'm focused on [AWS][4], the files
-are [CloudFormation][5] scripts.
-
- > One thing I dislike about BOSH is how it uses a state file or global options to specify the director/deployment. It
- > makes it very inconvenient to quickly switch between directors/deployments even between multiple terminal sessions.
- > To help with that, `cloque` respects environment variables (or command line options) to know where it should be
- > working from. The `CLOQUE_BASEDIR` (exported earlier) is the most significant, and it was able to detect when it was
- > working from the `global` region/director and `core` deployment based on the current directory.
-
-Now that the global resources have been created, we can create our "core" resources for the `us-west-2` region. If you
-take a look at the [infrastructure.json][28] file, you'll see it creates a VPC, multiple subnets for each availability
-zone, a couple base security groups, and a gateway instance which will function as a VPN server to allow inter-region
-communication. You'll also notice it's using [Twig][10] templating to load `network.yml` and simplify what would be a
-lot of repeated resources. We'll use the `infra:put` command again, but this time within the `aws-usw2/core`
-directory...
-
- $ cd aws-usw2
- $ ( cd core && cloque infra:put )
- ...snip...
- > waiting...CREATE_IN_PROGRESS.........................CREATE_COMPLETE...done
-
- > BOSH supports ERB-templated deploy manifests. With ERB I found myself repeating a lot of code in each manifest when
- > trying to make it dynamic. After trying [spiff][21] (which I found a bit limited and difficult to understand), I
- > decided to use a different approach - one that would allow for the same dynamic, peer-config referencing, and
- > (later) transformational capabilities for both infrastructure configuration and BOSH deployment manifests.
-
-Once the `infra:put` command finishes, the `aws-usw2` part of the environment is complete which means the OpenVPN
-server is ready for a client. First we'll need to create and sign a client certificate though...
-
- # temporary directory
- $ mkdir tmp-myovpn
- $ cd tmp-myovpn
-
- # create a key (named after the hostname and current date)
- $ TMPOVPN_CN=$(hostname -s)-$(date +%Y%m%da)
- $ openssl req \
- -subj "/C=US/ST=CO/L=Denver/O=ACME Inc/OU=client/CN=${TMPOVPN_CN}/emailAddress=`git config user.email`" \
- -days 3650 -nodes \
- -new -out openvpn.csr \
- -newkey rsa:2048 -keyout openvpn.key
- Generating a 2048 bit RSA private key
- .............................+++
- ................+++
- writing new private key to 'openvpn.key'
- -----
-
- # sign the certificate (you'll need to enter the PKI password you used in the first step)
- $ cloque openvpn:sign-certificate openvpn.csr
-
- # now create the OpenVPN configuration profile for connecting to aws-usw2
- $ ( \
- cloque openvpn:generate-profile aws-usw2 $TMPOVPN_CN \
- ; echo '
-
-## The Event Lifecycle
-
-For non-trivial ELK stacks, there are typically a few services that a message hits between being a line in a log file
-and a plotted point on a Kibana graph. With logsearch, and logstash in general, those services are:
-
-0. The Shippers - are responsible for getting log messages into logsearch (e.g. tailing log files with [nxlog][6]) by
- pushing them to...
-0. The Ingestors - which listen for those messages on various ports for various protocols (e.g. syslog). Rather than
- trying to immediately parse messages and be a bottleneck, it pushes messages into...
-0. The Queue - which helps buffer against degraded performance from large spikes. In logsearch, this is [redis][5].
- For real-time processing, the queue is typically empty because the messages should immediately be pulled by...
-0. The Parsers - which are responsible for parsing/extracting/transforming the log messages into something searchable.
- Typically, there are numerous parser rules for the various types of log files. Once parsed, they get pushed to...
-0. The Data Store - where the parsed message lives in elasticsearch for the rest of its life, searchable by tools like
- Kibana.
-
-In our situation, we could see that the parsers were becoming the bottleneck. Despite relatively consistent logging
-rates, the CPU loads would max out and messages were reaching elasticsearch at very slow rates. As a short-term fix,
-we could easily start up several more parsers which helped a little bit, but this required manual intervention and
-wasn't actually fixing the problem.
-
-
-## Areas to Profile
-
-Logstash itself has a `--debug` option which will dump details about every input, filter, and output each event
-hits. This is helpful when testing individual events, but in a production environment with thousands of events per
-minute it just became too noisy to be useful. We needed a different solution.
-
-Typically, when all is said and done, we only have one timestamp to look at: `@timestamp` as extracted from the log
-message and indicating when the log message was originally emitted. However, when the bottlenecks were occurring, there
-was up to an hour delay between seeing the messages in dashboards and we had no way to measure how long messages were
-stuck nor see where they were stuck. We decided to inject a few more fields into events...
-
-First, we wanted to know when log messages were first entering our logsearch stack. This would help validate that our
-shippers are pushing data into the cluster in a timely manner (rather than significant batching or simply getting
-stuck). To do this, I configured ingestors to add the current time to every message when it came in. I also added
-fields documenting which BOSH job received the message to help us keep an eye on how balanced the ingestors may be.
-So, now our messages have a few additional fields...
-
- * `@ingestor[timestamp]` - the time the ingestor saw the event (e.g. `2014-11-14T12:02:36.181Z`)
- * `@ingestor[job]` - the job which ingested the event (e.g. `ingestor/1`)
- * `@ingestor[service]` - which logsearch job template received the message (e.g. `syslog`)
-
-The next step in the lifecycle was the queue. The easiest way to monitor how long a message stayed in the queue is to
-add another timestamp right when the parser shifts the message off the queue. Since we have multiple parsers running, I
-configured them to also add their BOSH job name as a field. With the working theory that some of our parser rules were
-especially inefficient, I also added a final timestamp at the very end of the parsing rules. This would let us compare
-start/end parser timestamps. Now messages have a few more fields...
-
- * `@parser[timestamp]` - the time the parser saw the event (e.g. `2014-11-14T12:02:36.450Z`)
- * `@parser[job]` - the job which parsed the event (e.g `parser-z1/3`)
- * `@parser[timestamp_done]` - the time when the parser finished parsing the event (e.g. `2014-11-14T12:02:36.462Z`)
-
-With those 6 new fields, the event now has some very valuable metadata that we can review. However, the information
-would be much more valuable if we could easily and aggregate and graph individual events. So I added a bit more
-overhead with math and graphable fields...
-
- * `@parser[duration]` - instead of `timestamp_done`, switch to the duration the parser took (e.g. `12`)
- * `@timer[ingested_to_parsed]` - essentially the time our logsearch stack spent on the event from when we first
- saw it to (roughly) when the end user should be able to search it (e.g. `281`)
- * `@timer[emit_to_ingested]`, `@timer[emit_to_parsed]` - if the conventional `@timestamp` field is parsed out of the
- log message, we can use that as an absolute starting point and get further insight into how slow shippers are to
- send the message (e.g. `301`, `582`)
-
-
-## Graphing Bottlenecks
-
-After deploying the changes we were able to make some new Kibana dashboards to help visualize all our new metrics.
-Since parsers seemed to be the bottleneck, we first wanted to monitor how many messages the jobs were actually parsing
-at a given time...
-
-
-
-During light loads where everything would be processing in real-time, we expected it to fully mirror our other chart
-measuring the rates we were receiving the messages...
-
-
-
-Historically our spikes seemed random, so we started segmenting the average parse times by log types under the theory
-that some particular log was sending confusing messages. Our average time was around 10 ms, but after splitting by type
-we saw one log type was averaging more than one second (per message)...
-
-
-
-Clearly this would cause all of our parsing to slow down whenever that log suddenly saw a lot of activity. Now that we
-could find slow log messages, we were able to use them to track down some extremely non-performant regular expressions
-in one of our `grok` filters. After deploying the updated filters, we started seeing *much* more consistent parsing
-results among all our log types...
-
-
-
-
-## Conclusion
-
-I learned a few things from all this. Most notably is how invaluable it is to be able to inject profiling into various
-steps of an otherwise unmeasured lifecycle. Obviously this adds a bit of processing and storage overhead into the
-stack, but since we haven't noticed a large impact in our day-to-day usage we've kept the extra profiling enabled.
-Although we have yet to experience another incident of a poorly performing parser, we're ready with metrics when we do.
-In the meantime, we use it to more easily monitor the practical capacity of our logstash components.
-
-This became a great example about how such a relatively minor bug can be compounded and multiplied into bigger issues.
-A single log message taking 2 seconds isn't a big deal, even when you have 1000 other log messages/sec coming in - at
-worst you briefly lag by a couple seconds. If you have 10 parsers running it isn't even noticeable because the other 9
-parsers pick up the slack. But if all of a sudden you get 100 log messages hitting the slow bug, those 10 parsers will
-each spend 20 seconds working through those slow messages and, once they finish those 100, there will be 20,000
-messages waiting in the queue.
-
-Whether it's the [dashboards][7] we use to self-monitor, the [filters][8] we build app-specific parsers of off, or
-this new [profiling configuration][9] that we were motivated to work on -- I enjoy being in a role where these
-experiences can be codified, committed, and published in an open-source manner.
-
-
- [1]: http://www.elasticsearch.org/overview/elasticsearch/
- [2]: http://www.elasticsearch.org/overview/logstash/
- [3]: http://www.elasticsearch.org/overview/kibana/
- [4]: https://github.com/logsearch/logsearch-boshrelease
- [5]: http://redis.io/
- [6]: http://nxlog-ce.sourceforge.net/
- [7]: https://github.com/logsearch/logsearch-boshrelease/tree/develop/share/kibana-dashboards
- [8]: https://github.com/logsearch/?query=logsearch-filters
- [9]: https://github.com/logsearch/logsearch-boshrelease/pull/79/commits
diff --git a/blog/_posts/2015-02-21-sending-work-from-a-web-application-to-desktop-applications.md b/blog/_posts/2015-02-21-sending-work-from-a-web-application-to-desktop-applications.md
deleted file mode 100644
index 01e9e16..0000000
--- a/blog/_posts/2015-02-21-sending-work-from-a-web-application-to-desktop-applications.md
+++ /dev/null
@@ -1,92 +0,0 @@
----
-title: "Sending Work from a Web Application to Desktop Applications"
-layout: "post"
-tags: [ "applescript", "automation", "aws-sqs", "box", "dymo", "endicia", "hazel", "launchd", "osx", "phar", "php", "usps" ]
-description: "Using queues and PHP to automate third-party applications running on staff workstations."
-code: https://github.com/theloopyewe/elfbot
----
-
-I prefer working on the web application side of things, but there are frequently tasks that need to be automated outside the context of a browser and server. For [TLE][10], there's a physical shop where inventory, order, and shipping tasks need to happen, and those tasks revolve around web-based systems of one form or another. To help unify and simplify things for the staff (aka [elves][11]), I've been connecting scripts on the workstations with internal web applications via queues in the cloud.
-
-
-## Evolution of a bot
-
-Over the past 8+ years, the need for running commands on the desktop has changed. The easiest example to follow is how we have printed shipping labels over the years:
-
- 0. For the first few months, we would copy/paste the address into the [USPS Print & Ship][1] website, click through the shipping options ourselves, print out a label on sticky paper with inkjet, and copy/paste back the delivery confirmation into an order note. Averaging a few orders a day, it was quite manageable.
- 0. With more orders we needed something faster, so I created a form posting to USPS which prefilled all the fields. This way, all we needed to do was confirm/print and copy/paste the delivery confirmation back. That helped for a bit longer.
- 0. With a growing number of orders, we still needed something more, so we switched to [Endicia][2], a desktop application which had several integration options and the ability to print directly to a label printer. I switched from USPS links/forms to pre-composed links using Endicia's [custom URI handler][3]. This helped speed things up, save money on label paper, and also automatically copied confirmation codes for us to paste.
- 0. Occasionally we would have a couple problems with the URI approach, so I changed to using file downloads:
-
- 1. Instead of Endicia's custom URI handler, the server would send a file download with the XML-based postage details.
- 1. Using the [watched folder approach][4], OS X would notice the new file and send it to Endicia for printing.
-
- 0. This worked fairly well, but we quickly ran into a few quirks related to AppleScript's watched folder features and browser downloads - some files not being noticed at all or being noticed multiple times. We switched to [Hazel][5] which not only sidestepped the bugs we were seeing, but also provided me with better insight if something failed.
- 0. A bit later I discovered the `OutputFile` attribute of the [DAZzle spec][6] which would allow me to capture the results of the printed postage. By using and monitoring a different file extension for the output, I updated the script to parse the results and post the confirmation code to the website. This became an immense timesaver since it would allow postage to be queued instead of having to wait to paste each confirmation code manually. We used this approach for a long time.
-
-Eventually we needed to do more than just printing postage. The Hazel setup was straightforward, but the AppleScript implementation had become a bit too complex and inconvenient to test and change. We also needed this setup to be easily deployed on multiple systems. At this point I decided to spend some time coming up with a different solution which would better meet our needs.
-
-
-## The Bot
-
-Today's bot operates a bit differently. Rather than depending on monitored folders for file downloads, each workstation has its own queue (via [Amazon SQS][8]). Rather than complex logic in AppleScript, it is primarily based in PHP (as a [Phar][13]). Rather than Hazel managing processes, [launchd][9] typically runs it as an agent daemon. Rather than only printing shipping labels, it helps with several different tasks. Here are some of them...
-
-
-**Printing Postage** - the long-lived task of printing postage. The server pushes a resource URL which has the DAZzle XML data with address/contents/weights, the task gets the resource and sends it to Endicia, and then, once finished, it pushes the results back to the server where shipment costs and confirmation codes get extracted to update the order.
-
-**Purchasing Postage** - Endicia uses an account balance when printing postage, so whenever it gets low we need to reload it. Typically this requires user intervention since they don't support automatic reloading, but this task runs through the menus and dialogs with AppleScript ([discussed here][21]) to avoid any real interruptions. Whenever the system notices the balance getting low, it automatically sends this task to a capable workstation.
-
-**Archiving the Mailing Log** - Endicia keeps track of the postage it prints/buys/refunds in a mailing log. Over time this grows and slows things down, so Endicia provides an option to archive the log. Normally this is a manual process, but this task automates it. In addition to archiving, it also takes care of uploading the log to an encrypted S3 bucket where a server process can later go through to reconcile the transactions. A scheduler regularly sends this task to workstations running Endicia.
-
-**Label Printing** - another task we need to manage is printing labels for inventory through [DYMO Label][15]. The labels use a QR code ([discussed here][14]) and may include price and other product information. The server pushes a resource URL which has the XML-based label template appropriate for the product, embedding the product/inventory details. The task then downloads the label file to a temporary location and uses AppleScript to open it, printing however many copies are requested.
-
-**Webcam** - in addition to the [virtual tour][16] of the shop, we also have a [public webcam][17]. The webcam software supports sending snapshots to a URL endpoint on a timed interval, but it doesn't support SSL/TLS connections. As a workaround, this task takes care of downloading the snapshot as JPG and then uploading it securely to the correct endpoint. A scheduler is responsible for pushing this task to a server at the shop during business hours.
-
-**Printing** - a more recent experiment is for remotely printing regular documents. Sometimes the system sends emails to the staff when they need to reprint documents (such as pricing signs, pull details, or inventory locations). Rather than waiting for someone to see those emails and manually print them, I'm hoping the documents can just be waiting in the printer in the mornings for an elf to quickly pick up and handle.
-
-**User Dialog** - sometimes there are one-off tasks which need interaction. For example, letting the user know if Endicia is having confirmed service issues where we need to wait on printing more shipping labels.
-
-**Automatic Updates** - another more recent development is automatic updates. Historically I used read-only deployment keys and manually deployed the full repository to workstations. This was problematic on older machines since it needed `git`. Instead, I've started deploying Phars, creating them with [box][18] and publishing a versions manifest ([example][20]) for the [php-phar-update][19] component. Whenever it's convenient for the workstation, I can push the update task and let it self-update and restart.
-
-
-## From the Web
-
-From the server side of things, it maintains a hard-coded mapping of workstations and their available tasks. Whenever multiple workstations can handle a particular task, an extra field is presented to the user so they can pick where it should happen (defaulting to their own).
-
-
-
-Whenever the app needs to send a task to a bot, it queues a JSON object where the key is the task name and the its value is the task options. For example, the payload for purchasing new postage looks like:
-
-{% highlight json %}
-{ "endicia.purchase_postage": {
- "amount": 500 } }
-{% endhighlight %}
-
-
-## Conclusion
-
-PHP probably isn't most people's first thought for this sort of solution - there isn't any hypertext involved, after all. But since I didn't have to abuse PHP to fit here, and since it's a language I'm very productive with, it was the most efficient route to solving my problems. It has taken a few experiments to get to this point, but over the past ~2 years this queueing/PHP-based approach has been working out very well for us on the ~6 systems it runs on.
-
-Although it probably doesn't make much sense for others, I recently cleaned up and open sourced the bot portion of the code that I've been using for this. The [elfbot][12] repository has most of the tasks, an example configuration, and a compiled Phar in the releases. Maybe you'll find something interesting.
-
-
- [1]: https://www.usps.com/
- [2]: http://www.endicia.com/
- [3]: http://mac.endicia.com/extras/urls/
- [4]: http://mac.endicia.com/extras/applescript/
- [5]: http://www.noodlesoft.com/hazel.php
- [6]: http://mac.endicia.com/extras/xml/
- [8]: http://aws.amazon.com/sqs/
- [9]: https://developer.apple.com/library/mac/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html
- [10]: https://www.theloopyewe.com/
- [11]: https://www.theloopyewe.com/sheri/2008/08/the-loopy-elves-in-the-loopy-limelight
- [12]: https://github.com/theloopyewe/elfbot
- [13]: http://php.net/manual/en/book.phar.php
- [14]: /blog/2014/01/13/barcoding-inventory-with-qr-codes.html
- [15]: http://www.dymo.com/en-US
- [16]: https://www.theloopyewe.com/about/loopy-central/fort-collins
- [17]: https://www.theloopyewe.com/about/loopy-central/webcam/
- [18]: http://box-project.org/
- [19]: https://github.com/herrera-io/php-phar-update
- [20]: https://theloopyewe.github.io/elfbot/versions.json
- [21]: /blog/2013/01/28/scripting-endicia-to-purchase-postage.html
diff --git a/blog/_posts/2015-05-01-parsing-microdata-in-php.md b/blog/_posts/2015-05-01-parsing-microdata-in-php.md
deleted file mode 100644
index 0be202a..0000000
--- a/blog/_posts/2015-05-01-parsing-microdata-in-php.md
+++ /dev/null
@@ -1,75 +0,0 @@
----
-title: "Parsing Microdata in PHP"
-layout: "post"
-tags: [ "microdata", "opensource", "php", "schema", "xpath" ]
-description: "Open sourcing a library to easily traverse HTML for microdata."
-code: https://github.com/dpb587/microdata-dom.php
----
-
-A couple years ago I wrote about how I was [adding microdata][3] to [The Loopy Ewe][1] website to annotate things like products, brands, and contact details. I later wrote about how the internal search engine [depended on that microdata][4] for search results. During development and the initial release I was using some basic [XPath][2] queries, but as time passed the implementation became more fragile and incomplete. Since then, the parser has gone through several refactorings and this week I was able to extract it into a separate library that I can [open source][9].
-
-
-## Implementation
-
-My original implementation was a single helper class with a confusing mix of recursion, loops, and values by reference. The helper would receive the HTML string to parse and it would return a complex array with self-referencing values for multi-level scopes. Looking for a more reliable data structure to pass around, I decided to switch and extend the [`DOMDocument`][5]. I spent some time reading the [HTML Microdata][6] spec and wanted to try and find a balance between the spec's [DOM API][7] and existing PHP conventions.
-
-Now I use the library's [`MicrodataDOM\DOMDocument`][8] class when I want to parse a microdata document. It works just like the built-in `DOMDocument` so I'm able to manage libxml errors, control how I import the HTML document, and pass it through methods which are expecting a regular `DOMDocument`. The key difference is the addition of a `getItems` method which lets me quickly retrieve the microdata items. Internally, `getItems` and subsequent calls are still using XPath queries.
-
-In addition to extending `DOMDocument`, the library also extends `DOMElement`. This way, `getItems` is just returning a regular (but still specialized) list of DOM elements. The extended element class provides access to the microdata attributes like type, property name, and value.
-
-
-## Usage
-
-It's works like a low-level library, expecting other, more specialized classes to add their own friendlier methods on top. Here's the example I used in the readme...
-
-{% highlight javascript %}
-loadHTMLFile('http://dpb587.me/about.html');
-
-// find Person types and get the first item
-$dpb587 = $dom->getItems('http://schema.org/Person')->item(0);
-echo $dpb587->itemId;
-
-// items are still regular DOMElement objects
-printf(" (from %s on line %s)\n", $dpb587->getNodePath(), $dpb587->getLineNo());
-
-// there are a couple ways to access the first value of a named property
-printf("givenName: %s\n", $dpb587->properties['givenName'][0]->itemValue);
-printf("familyName: %s\n", $dpb587->properties['familyName']->getValues()[0]);
-
-// or directly get the third, property-defining DOM element
-$property = $dpb587->properties[3];
-printf("%s: %s\n", $property->itemProp[0], $property->itemValue);
-
-// use the toArray method to get a Microdata JSON structure
-echo json_encode($dpb587->toArray(), JSON_UNESCAPED_SLASHES) . "\n";
-{% endhighlight %}
-
-Which will output something like...
-
- http://dpb587.me/ (from /html/body/article/section on line 97)
- givenName: Danny
- familyName: Berger
- jobTitle: Software Engineer
- {"id":"http://dpb587.me/","type":["http://schema.org/Person"],"properties":{"givenName":["Danny"],...snip...}
-
-In addition to using it for the internal search, I've been using this library for other internal tools responsible for sanitizing, normalizing, and taking care of some validation during development and testing. Hopefully I'll be able to extract and open-source those features sometime as well.
-
-
-## Summary
-
-Back when I first started this, I couldn't find any good libraries for this sort of microdata parsing. Nowadays it looks like there's at least [one other project][10] which I would consider if I didn't already have an implementation. With bias, I do still favor mine because of the unit tests, `itemprop` properties implementation, and a bit closer mirroring of how the spec describes interacting with a microdata API.
-
-
- [1]: https://www.theloopyewe.com/
- [2]: http://php.net/manual/en/class.domxpath.php
- [3]: /blog/2013/05/13/structured-data-with-schema-org.html
- [4]: /blog/2013/06/01/search-engine-based-on-structured-data.html
- [5]: http://php.net/manual/en/class.domdocument.php
- [6]: http://www.w3.org/TR/microdata/
- [7]: http://www.w3.org/TR/microdata/#microdata-dom-api
- [8]: https://github.com/dpb587/microdata-dom.php/blob/master/src/MicrodataDOM/DOMDocument.php
- [9]: https://github.com/dpb587/microdata-dom.php
- [10]: https://github.com/linclark/MicrodataPHP
diff --git a/blog/_posts/2015-06-03-new-bosh-release-for-openvpn.md b/blog/_posts/2015-06-03-new-bosh-release-for-openvpn.md
deleted file mode 100644
index 07ea9b3..0000000
--- a/blog/_posts/2015-06-03-new-bosh-release-for-openvpn.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title: "New BOSH Release for OpenVPN"
-layout: "post"
-tags: [ "bosh", "openvpn" ]
-description: "Open sourcing a new BOSH release for managing an OpenVPN network."
-code: https://github.com/dpb587/openvpn-boshrelease
----
-
-I'm a big fan of [OpenVPN][1] - both for personal and professional VPNs. Seeing as how I've been deploying more things with [BOSH][2] lately, an OpenVPN release seemed like a good little project. I started one about nine months ago and have been using development releases ever since, but last week I went ahead and created a ["final" release][6] of it.
-
-There is only a single job (`openvpn`) and the properties are [well documented][3]. Its primary purpose is to act as a server for other clients to connect to, however you can also configure it to connect as a client and connect to another OpenVPN network as well. This makes it very easy to join multiple networks from a single OpenVPN connection.
-
-One of the more complicated steps of configuring an OpenVPN server is figuring out and remembering the correct commands for creating and signing security keys and certificates. The [README][4] includes all those steps to get a server running in a deployment and a client connected to it. There are also a few other examples about some fancier configuration options such as: setting up `iptables` for shared networks, allowing VPN clients to communicate with each other, and making sure specific clients are assigned static IPs.
-
-After going through the process of setting up quite a few OpenVPN servers and trying to automate and maintain them, this BOSH release has become my preferred method given its flexibility, consistency, and handy readme so I'm no longer Googling at every step. Check out the [project page][5] if you'd like to learn more, or see the [releases][6] page there for a tarball that you can use in your own BOSH environment.
-
-
- [1]: https://openvpn.net/
- [2]: http://bosh.io/
- [3]: https://github.com/dpb587/openvpn-boshrelease/blob/89fd58982db3327e26cb8e2b9ed06835ffb08dd1/jobs/openvpn/spec#L17
- [4]: https://github.com/dpb587/openvpn-boshrelease/blob/master/README.md
- [5]: https://github.com/dpb587/openvpn-boshrelease
- [6]: https://github.com/dpb587/openvpn-boshrelease/releases
diff --git a/blog/_posts/2015-06-20-using-nginx-to-reverse-proxy-and-cache-s3-objects.md b/blog/_posts/2015-06-20-using-nginx-to-reverse-proxy-and-cache-s3-objects.md
deleted file mode 100644
index d4516e6..0000000
--- a/blog/_posts/2015-06-20-using-nginx-to-reverse-proxy-and-cache-s3-objects.md
+++ /dev/null
@@ -1,81 +0,0 @@
----
-title: "Using nginx to Reverse Proxy and Cache S3 Objects"
-layout: "post"
-tags: [ "aws", "aws-s3", "caching", "nginx", "reverse-proxy", "s3", "upstream" ]
-description: "Using S3 as an upstream server for improving long-tail traffic."
----
-
-My most recent project for [TLE][1] has been focused on making the infrastructure much more "cloud-friendly" and resilient to failures. One step in the project was going to require that more than one version might be running at a given time (typically just while a new version is still being rolled out to servers). The application itself doesn't have an issue with that sort of transition period, however, the way we were handling static assets (like stylesheets, scripts, and images) was going to cause problems. First, some background...
-
-When the frontend application code gets built and packaged up, it only contains the static assets for its own version. The static assets get dumped into `/docroot/static/{hash}/`, where the hash is generated based on when they were last modified and build runtime details. Once the application gets deployed and symlinked live, the old versions are no longer accessible from the document root. This obviously has implications like:
-
- 0. Late requests for those old assets result in 404s (infrequently users, usually bots).
- 0. Application servers must be reloaded onto the new version at the same time (otherwise, an old server without the new assets might be used by the proxy).
-
-Additionally, we use [CloudFront][2] as a CDN for those static assets with our website configured as the origin. If the CDN gets back a 404 for an asset (old or new) it is cached for a short period and potentially affects a lot of clients (particularly bad if it happens on the upcoming, new version). Since CloudFront supports [S3][4] buckets as origins, I figured we could use it to store all the versions of our static assets. I quickly added a step to the deployment process which uploads new assets to a bucket. However, that was only part of the solution.
-
-Unfortunately, CloudFront doesn't support dynamic [gzip][5] compression - it will only send back, byte-for-byte, what the origin delivers and we were storing the plain, non-gzipped versions in S3. The options were to...
-
- 0. no longer provide the files in gzip form (bad option... some files are genuinely large);
- 0. store both plain and gzip versions in separate S3 objects, then change the web application to dynamically rewrite the `link`/`script`/URLs based on browser headers (a lot of work, fragile, and bad use of existing web standards); or
- 0. continue using our website as the origin where responses could correctly be `Vary`'d and conditionally compressed.
-
-The last one was definitely my preferred choice, but we would still have the problem of a single version being on the filesystem and unpredictable results when multiple application server versions were running behind the proxy. After some thought, I wanted to try using the S3 bucket as an upstream and avoiding the application servers altogether. And to improve latency and minimize the external, S3 requests I could cache them locally. After some experimentation, I ended up with something like the following in our [nginx][3] configs...
-
- location /static/ {
- # we can only ever GET/HEAD these resources
- limit_except GET {
- deny all;
- }
-
- # cookies are useless on these static, public resources
- proxy_ignore_headers set-cookie;
- proxy_hide_header set-cookie;
- proxy_set_header cookie "";
-
- # avoid passing along amazon headers
- # http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html
- proxy_hide_header x-amz-delete-marker;
- proxy_hide_header x-amz-id-2;
- proxy_hide_header x-amz-request-id;
- proxy_hide_header x-amz-version-id;
-
- # only rely on last-modified (which will never change)
- proxy_hide_header etag;
-
- # heavily cache results locally
- proxy_cache staticcache;
- proxy_cache_valid 200 28d;
- proxy_cache_valid 403 24h;
- proxy_cache_valid 404 24h;
-
- # s3 replies with 403 if an object is inaccessible; essentially not found
- proxy_intercept_errors on;
- error_page 403 =404 /_error/http-404.html;
-
- # go get it from s3
- proxy_pass https://s3-us-west-1.amazonaws.com/example-static-bucket$1;
-
- # annotate response about when it was originally retrieved
- add_header x-cache '$upstream_cache_status $upstream_http_date';
-
- # heavily cache results downstream
- expires max;
- }
-
-So, with the above configuration...
-
- * CloudFront still points to our website and we can serve gzip/plain at the same resource;
- * assets are kept around indefinitely (and we could utilize bucket lifecycle policies if it becomes an issue);
- * frontend web server no longer relies on a particular application server's filesystem;
- * access to the S3 bucket/prefix can be restricted via bucket policy; and
- * most importantly... deployment timing is no longer critical - versions can be deployed at whatever pace is appropriate and possible.
-
-Since deploying these changes over a month ago, everything has been working very well and the number of static 404 nuissances in our error logs have dropped significantly. It also made it much easier to move onto the next problem on the path to cloud-friendliness and resiliency...
-
-
- [1]: https://www.theloopyewe.com/
- [2]: http://aws.amazon.com/cloudfront/
- [3]: http://nginx.org/
- [4]: http://aws.amazon.com/s3/
- [5]: https://en.wikipedia.org/wiki/Gzip
diff --git a/blog/_posts/2015-08-03-self-upgrading-packages-in-bosh-releases.md b/blog/_posts/2015-08-03-self-upgrading-packages-in-bosh-releases.md
deleted file mode 100644
index 8f8a9d0..0000000
--- a/blog/_posts/2015-08-03-self-upgrading-packages-in-bosh-releases.md
+++ /dev/null
@@ -1,137 +0,0 @@
----
-title: "Self-Upgrading Packages in BOSH Releases"
-layout: "post"
-keywords: [ "bosh", "package manager", "updates", "upgrades", "versions" ]
-description: "A strategy for monitoring upstream dependencies for self-sustaining packages."
----
-
-Outside of [BOSH][1] world, package management is often handled by tools like [yum][2] and [apt][3]. With those tools, you're able to run trivial commands like `yum info apache2` to check the available versions or `yum update apache2` to upgrade to the latest version. It's even possible to automatically apply updates via cron job. With BOSH, it's not nearly so easy since you must monitor upstream releases, manually downloading the sources before moving on to testing and deploying. Personally, this repetitive sort of maintenance is one of my least favorite tasks; so, to avoid it, I started automating.
-
-
-## Automating
-
-There are two critical steps involved with sort of thing. First is being able to `check` when new versions are available. For this post, I'll use my [OpenVPN BOSH Release][9] which has a single package with three dependencies. For each dependency, I can use commands to check for the latest version...
-
- # lzo
- $ wget -q -O- http://www.oberhumer.com/opensource/lzo/download/ | grep -E 'href="lzo-[^"]+.tar.gz"' | sed -E 's/^.+href="lzo-([^"]+).tar.gz".+$/\1/' | gsort -rV | head -n1
- 2.09
-
- # openssl
- $ git ls-remote --tags https://github.com/openssl/openssl.git | cut -f2 | grep -Ev '\^{}' | grep -E '^refs/tags/OpenSSL_.+$' | sed -E 's/^refs\/tags\/OpenSSL_(.+)$/\1/' | tr '_' '.' | grep -E '^\d+\.\d+\.\d+\w*$' | gsort -rV | head -n1
- 1.0.2d
-
- # openvpn
- $ git ls-remote --tags https://github.com/OpenVPN/openvpn.git | cut -f2 | grep -Ev '\^{}' | grep -E '^refs/tags/v.+$' | sed -E 's/^refs\/tags\/v(.+)$/\1/' | tr '_' '.' | grep -E '^\d+\.\d+\.\d+$' | gsort -rV | head -n1
- 2.3.7
-
-The location to download the source for a dependency is typically predictable, once the pattern is known...
-
- $ wget -O lzo.tar.gz "http://www.oberhumer.com/opensource/lzo/download/lzo-${VERSION}.tar.gz"
-
-Within the release, files become structured like:
-
- ./blobs/openvpn-blobs/
- ./lzo/
- lzo.tar.gz
- ./openssl/
- openssl.tar.gz
- ./openvpn/
- openvpn.tar.gz
- ./packages/openvpn/
- ./deps/
- ./lzo/
- ./check
- ./get
- ./VERSION
- ./openssl/
- ./check
- ./get
- ./VERSION
- ./openvpn/
- ./check
- ./get
- ./VERSION
- ./packaging
- ./spec
-
-Each dependency has its own blob directory, allowing old versions to be fully removed before replacing it with the new version's file(s). Inside the package directory, `VERSION` is a committed state file used for comparison in version checks. It can also be used to quickly reference and document what versions are being used...
-
- $ find packages -name VERSION | xargs -I {} -- /bin/bash -c 'A={} ; printf "%12s %s/%s\n" $( cat $A ) $( basename $( dirname $( dirname $( dirname $A ) ) ) ) $( basename $( dirname $A ))'
- 2.09 openvpn/lzo
- 1.0.2d openvpn/openssl
- 2.3.7 openvpn/openvpn
-
-One side effect of this structure is that the `packaging` script and `spec` manifest should be version agnostic. Otherwise you still end up needing to tweak them every time a version changes, defeating the automation. In `packaging`, references such as `openssl-1.0.2d` would typically become `openssl-*`. In `spec`, the `files` property is minimal...
-
- ---
- name: "openvpn"
- files:
- - "openvpn-blobs/**/*"
-
-When it comes time to upgrade dependencies I can run a [utility script][5]...
-
- $ ./bin/deps-upgrade-auto
- ==> openvpn/lzo
- --| local 2.09
- --| check 2.09
- ==> openvpn/openssl
- --| local 1.0.1m
- --| check 1.0.2d
- --> fetching new version
- --> 5.1M
- ==> openvpn/openvpn
- --| local 2.3.6
- --| check 2.3.7
- --> fetching new version
- --> 1.1M
-
-The script runs through all the dependencies, uploads new blobs to the blobstore, and commits the changes with a nice summary...
-
- $ git log --format=%B -n1
- Upgraded 2 package dependencies
-
- openvpn
-
- * openssl now 1.0.2d (was 1.0.1m)
- * openvpn now 2.3.7 (was 2.3.6)
-
-At this point, I have a single command that I can run to check and upgrade dependencies in all my packages. This openvpn example is fairly trivial, but some packages are much more complicated with many more dependencies from separate sites and using separate versioning and download strategies.
-
-
-## Continuous Integration
-
-Of course, upgrades aren't always without issue, which is why it's important to integrate it with existing tests and Continuous Integration pipelines. Consider the following workflow:
-
- * weekly, CI runs `deps-upgrade-auto` off the `master` branch, pushing new versions to `master-autoupgrade`
- * CI monitors `master-autoupgrade` for new commits, and follows the typical development pipeline
- * it creates a new development release version (i.e. `bosh create release`)
- * it creates a new test deployment with the version and test data
- * it runs unit tests and errand tests against the deployment
- * based on what happens to this version-testing branch...
- * *on-success*: send a Pull Request for a human to review and merge (or, assuming you have quality tests, go ahead and merge it automatically)
- * *on-failure*: create an issue in the repo listing the dependency versions which changed and information about the failed step so that a human can intervene with a headstart on where they need to start investigating
-
-This sort of pipeline results in...
-
- * best case scenario - a bot sends me a PR with upgraded dependencies which have been tested and confirmed to work in my release and I can click "Merge"
- * worst case scenario - a bot tells me I should upgrade OpenSSL but I need to investigate an issue where OpenVPN client connects are now failing a TLS handshake
-
-
-## Conclusion
-
-These `check`/`get`-type scripts and the self-upgrading approach is something I've been using in my releases lately. The value for me comes from the inherent documentation it provides, but mainly it's from being able to offload some of the maintenance burdens I normally need to be concerned about. Although I have yet to fully implement the steps from the [CI section](#continuous-integration) into my [Concourse][8] pipelines, I hope to get there at some point soon.
-
-If you're interested in experimenting with the scripts from this post, you can find them in [this gist][7] along with a few other `check` scripts I've been using. You can also take a look at the commits in the OpenVPN BOSH Release where I [switched][10] to using `deps` and then subsequently [auto-upgraded][11] the dependencies.
-
-
- [1]: https://bosh.io/
- [2]: https://en.wikipedia.org/wiki/Yellowdog_Updater,_Modified
- [3]: https://wiki.debian.org/Apt
- [4]: https://openvpn.net/
- [5]: https://gist.github.com/dpb587/e2d955f00378c1b78ea2#file-bin-deps-upgrade-auto-sh
- [6]: http://php.net/
- [7]: https://gist.github.com/dpb587/e2d955f00378c1b78ea2
- [8]: http://concourse.ci/
- [9]: https://github.com/dpb587/openvpn-boshrelease
- [10]: https://github.com/dpb587/openvpn-boshrelease/commit/26f115dfd5d80444fee543e17edf198e7d15b485
- [11]: https://github.com/dpb587/openvpn-boshrelease/commit/ac833f99cb361b0cb7fb39d70b70a0403ba87af8
diff --git a/blog/_posts/2015-08-06-pruning-blobs-from-bosh-releases.md b/blog/_posts/2015-08-06-pruning-blobs-from-bosh-releases.md
deleted file mode 100644
index 3c93f33..0000000
--- a/blog/_posts/2015-08-06-pruning-blobs-from-bosh-releases.md
+++ /dev/null
@@ -1,43 +0,0 @@
----
-title: "Pruning Blobs from BOSH Releases"
-layout: "post"
-keywords: [ "blobs", "bosh", "cleanup", "packages", "pruning" ]
-description: "Avoiding unnecessary disk usage for old, unneeded package files."
----
-
-Over time, as blobs are continually added to [BOSH][1] releases, the files can start consuming lots of disk space. Blobs are frequently abandoned because newer versions replace them, or sometimes the original packages referencing them are removed. Unfortunately, freeing the disk space isn't as simple as `rm blobs/elasticsearch-1.5.2.tar.gz` because BOSH keeps track of blobs in the `config/blobs.yml` file and uses symlinks to cached copies.
-
-To help keep a lean workspace, I remove references to blobs which are no longer needed in my release. The blobs remain untouched in the blobstore/S3, but as far as my local `bosh` command cares about, it doesn't need to keep local copies. One option for pruning is to manually edit `config/blobs.yml` and remove the old references (and then run `bosh sync blobs` to update `blobs/`). However, I tend to go the other direction - interactively or with shell scripts - removing files from `blobs/` and then updating `blobs.yml` with this command...
-
- for FILE in $( grep -E '^[^ ].+:$' config/blobs.yml | tr -d ':' ) ; do
- [ -e "blobs/${FILE}" ] || sed -i '' -E -e "\\#^${FILE}:\$#{N;N;N;d;}" config/blobs.yml
- done
-
-Once they're gone from `blobs.yml` I can commit the changes and know that the next time I need to clone/sync into a new workspace it'll be faster.
-
- git commit -om 'Prune old blob references' config/blobs.yml
-
-But... while those blobs are no longer listed in `config/blobs.yml` and they are no longer in `blobs/`, the blob still exists in the `.blobs` directory where `bosh` keeps an original copy. I can remove unreferenced blobs from `.blobs` with this command...
-
- for BLOBSHA in $( find .blobs -type f ) ; do
- grep -qE "^ sha:\s+$( basename $BLOBSHA )" config/blobs.yml || rm "$BLOBSHA"
- done
-
-Even though the blobs are now effectively gone, their references still exist in repository history. For example, if you wanted to rebuild your `.blobs` cache directory you could loop through changes to `blobs.yml` and rerun `bosh sync blobs` to restore local copies...
-
- for COMMIT in $( git rev-list --parents HEAD -- config/blobs.yml | cut -d" " -f1 ; git rev-parse HEAD ) ; do
- git checkout "$COMMIT" config/blobs.yml
- bosh sync blobs
- done
-
-As an example, here's a before and after of cleaning up blobs in my long-running [logsearch-boshrelease][2] workspace...
-
- $ du -sh .blobs/ | cut -f1
- 904M
- ...cleanup...
- $ du -sh .blobs/ | cut -f1
- 168M
-
-
- [1]: http://bosh.io/
- [2]: https://github.com/logsearch/logsearch-boshrelease
diff --git a/blog/_posts/2015-11-12-tempore-limites-bosh-veneer.md b/blog/_posts/2015-11-12-tempore-limites-bosh-veneer.md
deleted file mode 100644
index ac5c1ee..0000000
--- a/blog/_posts/2015-11-12-tempore-limites-bosh-veneer.md
+++ /dev/null
@@ -1,158 +0,0 @@
----
-title: "Tempore limites: BOSH Veneer"
-layout: "post"
-keywords: [ "bosh", "browser", "frontend", "user interface" ]
-description: "Experimenting with a browser frontend to working with BOSH."
----
-
-For all the low-level handling of things, BOSH is a good tool for system administration. But when it comes to
-configuring everything, I think it leaves something to be desired for the average Joe. Opening my text editor, making
-changes to the YAML, copying and pasting security groups from AWS Console, `git diff`ing to make sure I did what I
-think, `git commit`ing in case things go bad, `bosh deploy`ing to make it so... it can become quite the process. For me,
-I'm much more a visual person and prefer a browser-based tool. Since I've had a bit extra free-time lately, I've spent
-some time experimenting on ideas to help improve my BOSH-quality-of-life.
-
-
-## BOSH
-
-
-
-The `bosh` CLI can work with multiple directors and uses the `target` command to switch between instances. With a
-browser-based tool, I just need to browse to the director or whatever dedicated instance I've deployed the release to.
-From there, I login with my credentials as I would with `bosh login`.
-
-While working with the project, I've been referring to it as "veneer", as in "a thin decorative covering of fine wood
-applied to a coarser wood or other material."
-
-
-
-One of the core features is to simply provide browser-based pages to view BOSH resources. For example, it's easy to see
-the list of releases and details about specific release versions. This makes the release and configuration process much
-more discoverable to end users. The screenshot shows details about the logsearch release, something which I deploy
-alongside all deployments to collect logs and metrics.
-
-
-
-Of course, the most common BOSH resource is deployments. I can quickly pull up a specific VM to see what's installed and
-how it is configured in the cloud. Since I'm using the AWS CPI, an extra link is shown on the side which links directly
-to the instance in AWS Console. Further down on that page is a section which describes the persistent disk on the VM.
-
-
-
-The AWS component of veneer knows the various CloudWatch metrics which are available for instances and disks. Here the
-persistent disk metrics are shown, including timing, queue length, and idle time below. This allows me to quickly pull
-up graphs if I'm trying to investigate an issue. If I do need to diagnose further in AWS Console, the sidebar link will
-take me straight to the EBS Volume there.
-
-
-
-I mentioned I included logsearch alongside all my deployments. Similar to veneer's AWS component, I also have a
-logsearch component which advertises many internal metrics for the BOSH resources. Here, on a specific job, I can
-quickly see load and memory usage over the past few hours. I can hover over the chart for specific values, or click
-into the graph to change the time span, granularity, and statistical method used.
-
-
-## Marketplace
-
-
-
-One of the reasons I like BOSH is because I can use releases from both the open-source community, but also my own
-internally built releases. The marketplace component provides that central view into the various sources where I can
-pull my releases and stemcells from. For example, `theloopyewe` marketplace enumerates a private S3 bucket using a regex
-to identify artifacts and their release name/version. Of course, the `boshio` one scrapes and uses the API to pull down
-the public [bosh.io][4] resources.
-
-
-
-From bosh.io, I can easily view the list of stemcells which are available. There are many more stemcells than I actually
-use from a single director, so the checkmark helps me identify which one(s) I have already uploaded to the director. If
-I want to see the full list of versions, I click on the name for a similar view. Versions which follow [semver][5] are
-parsed to provide intelligent advice about whether deployments are up to date in their release and stemcell usage.
-
-
-
-When viewing a specific stemcell version, I get a quick summary and, if it's not already installed, I have the option to
-upload it to the director right on-screen. Assuming the director has internet access, I can click "Upload" where the
-task will be started and I get redirected to the task detail page. The release version page is similar.
-
-
-
-The task page automatically updates until it has completed successfully at which point it'll redirect me to the main
-stemcell summary page indicating it was completed. If an error occurs, it'll show me the full error and wait for me to
-diagnose and figure out a resolution myself.
-
-
-## Operations
-
-
-
-I've mentioned the BOSH, AWS, Logsearch, and Marketplace component, but the most intriguing component is Operations.
-This component handles more of the management tasks, most notably, editing deployment configuration. It provides the
-core forms for deployment manifests, but it also imports the forms that the CPI-specific component provides.
-
-
-
-For example, the Cloud Properties section of the resource pool uses the AWS-specific form including Instance Type, but
-also properties like Availability Zone and ELB Names below the fold. You can also see the stemcell field is
-intelligently populated based on the stemcell names and versions which are installed on the director.
-
-
-
-Editing a job is also straightforward - it references the resource and disk pools already configured in the manifest so
-they're easy to select. The templates are also enumerated based on which releases are already configured in the manifest
-and installed on the director. The forms also clearly indicate which properties are required vs which are optional
-(since there are often more properties available than are typically needed).
-
-
-
-Properties are another piece of deployments which are frequently changed. Here, properties are enumerated based on which
-releases and templates are referenced in the deployment manifest. A green plus on the right indicates the property is
-not currently set, while a blue pencil button shows a setting is currently set.
-
-
-
-If I do want to change the property, a simple form comes up where I can input my YAML-friendly value. If the release's
-job spec provides the metadata, the help information includes description and example information.
-
-
-
-When changes have been saved, they are not immediately sent to the director. This allows multiple changes to be made and
-then deployed at a coordinated time. It's important not to forget changes though, so the banner provides a reminder that
-changes are pending and provides a link to compare the changes before applying them.
-
-
-## Core
-
-
-
-I mentioned changes are not immediately applied, and this is because they are actually written to new branch in the git
-repository where everything is maintained. The git repository provides the support for versioning and merging - when
-clicking the Review button, it's actually just showing an intelligent `diff` between `master` and the drafted branch.
-
-
-
-Similarly, as a git repository it can be cloned over HTTPS from veneer for backup purposes or advanced editing and then
-even pushed back. This makes veneer more of a tool which can function alongside other infrastructure tools which also
-commit their configurations. For example, in the earlier photo you'll see `cloudformation.json` templates - something
-which I currently manage externally yet can still reference from my deployment manifests using pre-processing
-capabilities that veneer provides.
-
-
-## Summary
-
-For enterprisey-types, I've heard there's such a thing as [Ops Manager][1] which helps provide a bit of a frontend for
-deploying [certain software][2] (like [Cloud Foundry][3]). I'm not quite an enterprisey-type and don't have an
-enterprisey budget, but I still appreciate having shiny tools where I can point my browser to manage, monitor, and
-cross-reference my technical resources.
-
-Since my extra free-time is coming to a close as I move on to another chapter in my life, this project will sit on my
-backburner. I still like the features and ideas though, so I figured I could write a post summarizing some of them. At
-the very least, if you encounter a project with similar features leave a comment - I'd love to use it myself!
-
-
-
- [1]: http://docs.pivotal.io/pivotalcf/customizing/
- [2]: https://network.pivotal.io/
- [3]: https://www.cloudfoundry.org/
- [4]: https://bosh.io/
- [5]: http://semver.org/