storage api

This commit is contained in:
Danny Berger
2013-03-01 12:40:47 -07:00
parent 47971b2e85
commit 5b51d69ab2

View File

@@ -0,0 +1,122 @@
---
title: A Generic Storage Interface
layout: post
tags: asset php storage
description: Abstracting file storage, whether it's local or cloud.
---
Websites often have a lot of different assets and files for the various areas of a website - content management systems,
photo galleries, e-commerce product photos, etc. As a site grows, so does storage demand and backup requirements, and as
storage demands grow it typically becomes necessary to distribute those files across multiple servers or services.
One method for managing disparate file systems is to use custom PHP [stream wrappers][4] and configurable paths; but
some extensions don't yet support custom wrappers for file access. An alternative that I've been using is an object and
service-oriented approach to keep my application code independent from the storage configuration.
### Interface
At the core of my design, is the asset storage interface which looks something like:
{% highlight php %}
<?php interface StorageEngineInterface {
// store a file and return back a token that can be used to retrieve it
function store(SplFileInfo $file);
// retrieve a locally-accessible SplFileInfo based on the token
function retrieve($token);
// remove data from storage based on the token
function purge($token);
}
{% endhighlight %}
The storage engine is responsible for generating a reusable token that can be used for later retrieval. Generally, I
simply have it generate a UUID as the token, however tokens could have storage-specific meaning.
### Sample Storage Engines
I've used several base implementations:
* `LocalStorageEngine` - the simplest storage using a local/NFS filesystem
* `AWSS3StorageEngine` - using [AWS S3][1] for storage
* `SftpStorageEngine` - using PHP's [ssh2][2] module to access files on servers via SFTP
* `AtlassianConfluenceStorageEngine` - managing documents within [Confluence][3] wikis
Remote services like AWS S3 and SFTP can cause significant performance issues. To help with that, I use a
`CachedStorageEngine` implementation. It accepts two `StorageEngineInterface` arguments: one as the upstream engine, and
one as the local cache. For example:
{% highlight php %}
<?php
new CachedStorageEngine(
new AWSS3StorageEngine(new Aws\S3\S3Client(...), 'bucket.example.com', 'my-prefix'),
new LocalStorageEngine('/tmp/s3-bucket.example.com-cache')
);
{% endhighlight %}
And since `CachedStorageEngine` is just another implementation of `StorageEngineInterface`, it can be used
interchangeably within the application with performance being the only difference.
### Application Usage
Using dependency injection, each of the storage backends becomes an independent service, configured depending on the
application requirements. The application then has no storage-specific calls like `copy`, `file_get_contents`, `fopen`,
etc and the code looks something like:
{% highlight php %}
<?php
// storage service for photos
$storage = $dic->get('photo_storage')
// save a new photo
$photo = new PhotoRecord();
$photo->setAssetToken(
$storage->store($request->files->get('upload'))
);
// use the photo
$image = (new Imagine\Gd\Imagine())->open(
$storage->retrieve($photo->getAssetToken())
);
// delete the photo
$storage->purge($photo->getAssetToken());
$photo->delete();
{% endhighlight %}
Since `retrieve` will always return a [`SplFileInfo`][5] instance, it can be referenced and handled like a local file
(as demonstrated by the `open` call in the example.
### Complicating Things
The asset storage interface itself is fairly primitive, but it allows for some more complex configurations:
* by using dependency injection, it becomes extremely easy to switch storage engines since application code doesn't
need to change
* complex storage rules can be combined with meaningful tokens to, for example, store very large files on different
disks and using a token prefix to identify that class
* creating a fallback storage class which will go through a chain of storages searching until it's able to store or
retrieve a token
* internally deferring operations via queue manager (e.g. instead of storing files immediately to S3 and waiting for
upload time, write it locally and create a job to upload it in the background)
### Summary
By abstracting storage logic outside of my application code, it makes my life much more easier as a developer and as a
systems administrator when trying to manage where files are located and any relocations, as necessary.
[1]: http://aws.amazon.com/s3/
[2]: http://www.php.net/manual/en/book.ssh2.php
[3]: http://atlassian.com/software/confluence/overview/team-collaboration-software
[4]: http://www.php.net/manual/en/class.streamwrapper.php
[5]: http://us.php.net/manual/en/class.splfileinfo.php