Name	Name	Last commit message	Last commit date
Latest commit History 286 Commits
Godeps	Godeps
agent	agent
api_agent	api_agent
bin	bin
cmd	cmd
db	db
docs	docs
plugin	plugin
share	share
supervisor	supervisor
timespec	timespec
.gitignore	.gitignore
.travis.yml	.travis.yml
Makefile	Makefile
README.md	README.md
TODO	TODO

S.H.I.E.L.D. Backup Solution

Project Goal

The goal of this project is to build a standalone system that can perform backup and restore functions for a wide variety of pluggable data systems (like Redis, PostgreSQL, MySQL, RabbitMQ, etc.), storing backup data in pluggable storage solutions (i.e. local files, S3 blobstore, etc.).

The system should enable self-service for end users to perform ad hoc backup / restore operations, review backup schedules, retention policies and backup job runs, etc.

Engineers should be able to integrate support for new data systems and storage solutions without having to modify core code.

Architecture

Target Plugins

The system interfaces with data systems that hold the data to back up via Target Plugins. These plugins are bits of code that are compiled and linked into the Core Daemon, and implement a standard interface for the following operations:

backup

Retrieves data from the data system (via native means like pg_dump or the Redis SAVE command) and sends it to an Storage Plugin.

restore

Retrieves the data from an Storage Plugin and overwrites the data in the data system accordingly, using native means like pg_restore.

For data systems that permit full backups across a network (as most RDBMS do), nothing more is needed. Some data systems, however, make assumptions about the environment in which they operate. Redis, for example, always dumps its backups to local disk. To support these data systems, we can implement the Agent Target Plugin, and a corresponding Agent Daemon that will run on the target system. The Agent Daemon will be responsible for implementing the backup / restore options, and the Agent Target Plugin will forward the requests to it, and relay responses back to the caller.

Storage Plugins

The system interfaces with storage systems for uploading and retrieving backed up data files. These plugins are bits of code that are compiled and linked into the Core Daemon, and implements a standard interface for the following operations:

store

Store a single data blob (usually a file) in the remote storage system. Returns a key that can be used for later retrieval.

retrieve

Given a key returned from the store operation, retrieve the data blob.

purge

Given a key returned from the store operation, delete the stored data.

Core Daemon

The Core Daemon is the coordinating component that handles:

Metadata Management

What targets and stores exist, what schedules and retention policies are defined, what jobs are specified, what backups have taken place, and what tasks are in-flight.

Scheduling Backups

Kicks off backup tasks (owned by SYSTEM) for all jobs per their configured schedule.

Expiring Backups

Finds all expired entries in the archives and purges them from the remote storage system.

Ad hoc Backups

Kicks off backup tasks (owned by users) per end-user or operator request (via the HTTP API, detailed later.)

Restores

Handles retrieval of stored backup data and replay / restoration of that data to a given target.

Monitoring

Exposes metrics and statistics about backup jobs, allows searching of archives to ensure that backups are completing successfully, etc.

HTTP API

The HTTP API is a component of the Core Daemon that exposes management interfaces via REST endpoints. It underlies the Web UI and CLI components (described later).

Catalog Database

A dedicated data store that keeps track of schedules, retention policies, backup configurations, targets and stores, and running tasks. This database is private to the Core Daemon; there should be no need to query it directly, outside of maintenance tasks. Web UI and the CLI

The Web UI provides a rich user interface for operators and end-users to view configuration (schedules, policies, jobs, etc.) review archives, and monitor tasks in-progress. It also provides self-service functionality by allowing users to request ad hoc backup and restore operations.

The Web UI relies exclusively on the HTTP API.

The CLI provides similar functionality, in a scriptable, command-line interface. It also relies exclusively on the HTTP API. Catalog Database Schema Definition

TARGETS stores the information about the remote data systems that should be backed up. Each record identifies the method by which the target is backed up (plugin) and specific connection information required (endpoint)

CREATE TABLE targets (
  uuid      UUID PRIMARY KEY,
  name      TEXT,  -- a human-friendly name for this target
  summary   TEXT,  -- annotation for operator use, to describe the target
                   --   i.e.: "Production PostgreSQL database"
  plugin    TEXT NOT NULL,  -- short name of the target plugin, like 'postgres'
  endpoint  TEXT NOT NULL,  -- opaque blob used by target plugin to connect to
                            --   the remote data system.  Could be JSON, YAML, etc.
  agent     TEXT NOT NULL,  -- IP address and port (in ip:port format) of the
                            -- Shield agent that can backup/restore this target
);

STORES stores the destination of backup data, i.e. an S3 bucket, local file system directory, etc. Each record identifies a destination, the method by which to store and retrieve backup data to/from it ('plugin') and specific connection information required ('endpoint')

CREATE TABLE stores (
  uuid      UUID PRIMARY KEY,
  name      TEXT,  -- a human-friendly name for this store
  summary   TEXT,  -- annotation for operator use, to describe the store
  plugin    TEXT NOT NULL,  -- short name of the storage plugin, like 's3' or 'fs'
  endpoint  TEXT NOT NULL,  -- opaque blob used by storage plugin to connect to
                            -- the storage backend.  Could be JSON, YAML, etc.
);

SCHEDULES contains the timing information that informs the core daemon when it should run which backup jobs (or JOBS, see later).

CREATE TABLE schedules (
  uuid      UUID PRIMARY KEY,
  name      TEXT, -- a human-friendly name for this schedule
  summary   TEXT, -- annotation for operator use, to describe schedule
  timespec  TEXT NOT NULL, -- code in a DSL for specifying when to run backups,
                           --   i.e. 'sundays 8am' or 'daily 1am'
                           --   (note: may want to eval use of cron here)
);

RETENTION policies govern how long data is kept. For now, this is just a simple expiration time, with 'name' and 'summary' fields for annotation.

All backups taken MUST have a retention policy; no backups are kept indefinitely.

CREATE TABLE retention (
  uuid     UUID PRIMARY KEY,
  name     TEXT, -- a human-friendly name for this retention policy
  summary  TEXT, -- annotation for operator use, to describe policy
  expiry   INTEGER NOT NULL, -- how long (in seconds) before a given backup expires
);

JOBS keeps track of desired backup behavior, by marrying a target (the data to backup) with a store (where to send that data), according to a schedule (when to do the backups) and a retention policy (how long to keep the data for).

JOBS can be annotated by operators to provide context and justification for each job. For example, tickets can be called out in the notes field to direct people to more information about when the backup job was requested, and why.

CREATE TABLE jobs (
  uuid            UUID PRIMARY KEY,
  target_uuid     UUID NOT NULL, -- the target
  store_uuid      UUID NOT NULL, -- the store
  schedule_uuid   UUID NOT NULL, -- what schedule to use
  retention_uuid  UUID NOT NULL, -- what retention policy to use
  priority        INTEGER DEFAULT 50, -- priority, scale from 0 to 100 (0 = highest)
  paused          BOOLEAN, -- if true, this job is not run when scheduled.
  name            TEXT,    -- a human-friendly name for this schedule
  summary         TEXT,    -- annotation for operator use, to describe
                           --   the purpose of the job ('weekly orders db')
);

ARCHIVES records all archives as they are created, and keeps track of where the data came from, where it went, when the backed-up data expires, etc.

ARCHIVES can be annotated by operators, so that they can keep track of specifically important backups, like dumps of databases taken before potentially risky changes are attempted.

CREATE TABLE archives (
  uuid         UUID PRIMARY KEY,
  target_uuid  UUID NOT NULL, -- the target (from jobs)
  store_uuid   UUID NOT NULL, -- the store (from jobs)
  store_key    TEXT NOT NULL, -- opaque data returned from the storage plugin,
                              --   for use in restore ops / download / etc.
  taken_at     INTEGER NOT NULL,
  expires_at   INTEGER NOT NULL, -- based on retention policy
  notes        TEXT DEFAULT '', -- annotation for operator use, to describe this
                                --   specific backup, i.e. 'before change #422 backup'
                                --   (mostly, this will be empty)
);

TASKS keep track of non-custodial jobs being performed by the system. This includes scheduled backups, ad-hoc backups, data restoration and downloads, etc.

The core daemon interprets the 'op' field, and calls on the appropriate plugins, based on the associated JOB or ARCHIVE / TARGET entry.

Each TASK should be associated with either a JOB or an ARCHIVE.

Here are the defined operations:

Operation	Description
backup	Perform a backup of the associated JOB. The target and store are pulled directly from the JOB entry. Note: the `backup` operation is used for both ad hoc and scheduled backups.
restore	Perform a restore of the associated ARCHIVE. The storage channel is pulled directly from the ARCHIVE. The target can be specified explicitly. If it is not, the values from the ARCHIVE will be used. This allows restores to go to a different host (for migration / scale-out purposes).

CREATE TYPE status AS ENUM ('pending', 'running', 'canceled', 'failed', 'done');
CREATE TABLE tasks (
  uuid      UUID PRIMARY KEY,
  owner     TEXT, -- who owns / started this task?
  op        TEXT NOT NULL, -- name of the operation to run, i.e. 'backup' or 'restore'

  job_uuid      UUID,
  archive_uuid  UUID,
  target_uuid   UUID,

  status       status, -- current status of the task
  requested_at INTEGER NOT NULL, -- when the task was _created_
  started_at   INTEGER, -- when the task actually started
  stopped_at   INTEGER, -- when the task completed (or was cancelled)

  log       TEXT -- log of task activity
);

HTTP API

Schedules API

Purpose: allows the Web UI and CLI to find out what schedules are defined, and provides CRUD operations for schedule management. Allowing queries to filter to unused=t or unused=f enables the frontends to show schedules that can be deleted safely.

Method	Path	Arguments	Request Body
GET	/v1/schedules	?unused=[tf]	-
POST	/v1/schedules	-	see below
DELETE	/v1/schedule/:uuid	-	-
GET	/v1/schedule/:uuid	-	-
PUT	/v1/schedule/:uuid	-	see below

GET /v1/schedules

Response Body:

[
  {
    "uuid"    : "36f50f26-b007-433a-a67a-bdffbd0746c8",
    "name"    : "Schedule Name",
    "summary" : "a short description",
    "when"    : "daily at 4am"
  },

  "..."
]

POST /v1/schedules

Request Body:

{
  "name"    : "Schedule Name",
  "summary" : "a short description",
  "when"    : "daily at 4am"
}

Field	Required?	Meaning
name	Y	The name of the new schedule
summary	N	A short summary of what the schedule is for, when it should be used
when	Y	The schedule, in the Timespec Language

Response Body:

{
  "ok"   : "created",
  "uuid" : "6b8398be-fdc0-424a-8532-e812e5dfc116"
}

Field	Meaning
ok	The new schedule was created
uuid	The UUID of the newly-created schedule

PUT /v1/schedule/:uuid

Request Body:

{
  "name"    : "Schedule Name",
  "summary" : "a short description",
  "when"    : "daily at 4am"
}

Field	Required?	Meaning
name	Y	The name of the new schedule
summary	Y	A short summary of what the schedule is for, when it should be used
when	Y	The schedule, in the Timespec Language

NOTE: summary is required for update requests, whereas it is optional on creation.

Response Body:

{
  "ok" : "updated"
}

Field	Meaning
ok	The schedule was updated

Retention Policies API

Purpose: allows the Web UI and CLI to find out what retention policies are defined, and provides CRUD operations for policy management. Allowing queries to filter to unused=t or unused=f enables the frontends to show retention policies that can be deleted safely.

Method	Path	Arguments	Request Body
GET	/v1/retention	?unused=[tf]	-
POST	/v1/retention	-	see below
DELETE	/v1/retention/:uuid	-	-
GET	/v1/retention/:uuid	-	-
PUT	/v1/retention/:uuid	-	see below

GET /v1/retention

[
  {
    "uuid"    : "c5aed303-a6fc-4b68-b0e9-81431cc07a4e",
    "name"    : "Retention Policy Name",
    "summary" : "a short description",
    "expires" : 86400
  },

  "..."
]

POST /v1/retention

Request Body:

{
  "name"    : "Policy Name",
  "summary" : "a short description",
  "expires" : 86400
}

Field	Required?	Meaning
name	Y	The name of the new retention policy
summary	N	A short summary of the new retention policy
expires	Y	How long, in seconds, to keep archives made against this policy. This value must be at least 3600 (1h)

Response Body:

{
  "ok"   : "created",
  "uuid" : "6b8398be-fdc0-424a-8532-e812e5dfc116"
}

Field	Meaning
ok	The new retention policy was created
uuid	The UUID of the newly-created retention policy

PUT /v1/retention/:uuid

Request Body:

{
  "name"    : "Policy Name",
  "summary" : "a short description",
  "expires" : 86400
}

Field	Required?	Meaning
name	Y	The name of the new retention policy
summary	Y	A short summary of the new retention policy
expires	Y	How long, in seconds, to keep archives made against this policy. This value must be at least 3600 (1h)

NOTE: summary is required for update requests, whereas it is optional on creation.

Response Body:

{
  "ok" : "updated"
}

Field	Meaning
ok	The retention policy was updated

Targets API

Purpose: allows the Web UI and CLI to review what targets have been defined, and allows updates to existing targets (to change endpoints or plugins, for example) and remove unused targets (i.e. retired / decommissioned services).

Method	Path	Arguments	Request Body
GET	/v1/targets	?plugin=:name ?unused=[tf]	-
POST	/v1/targets	-	see below
DELETE	/v1/target/:uuid	-	-
GET	/v1/target/:uuid	-	-
PUT	/v1/target/:uuid	-	see below

GET /v1/targets

[
  {
    "uuid"     : "2f42d0b3-449a-4d0e-8576-a40cc552d7e5",
    "name"     : "Target Name",
    "summary"  : "a short description",
    "plugin"   : "plugin-name",
    "endpoint" : "{\"encoded\":\"json\"}",
    "agent"    : "10.17.66.54:5544"
  },

  "..."
]

POST /v1/targets

Request Body:

{
  "name"     : "Target Name",
  "summary"  : "a short description",
  "plugin"   : "plugin-name",
  "endpoint" : "{\"encoded\":\"json\"}",
  "agent"    : "10.17.66.54:5544"
}

Field	Required?	Meaning
name	Y	The name of the new target
summary	N	A short description of the target
plugin	Y	The name of the plugin to use when backing up this target
endpoint	Y	The endpoint configuration required to access this target's data
agent	Y	The host:port of a Shield agent that can backup/resetore this target

Response Body:

{
  "ok"   : "created",
  "uuid" : "6b8398be-fdc0-424a-8532-e812e5dfc116"
}

Field	Meaning
ok	The new target was created
uuid	The UUID of the newly-created target

PUT /v1/target/:uuid

Request Body:

{
  "name"     : "Target Name",
  "summary"  : "a short description",
  "plugin"   : "plugin-name",
  "endpoint" : "{\"encoded\":\"json\"}",
  "agent"    : "10.17.66.54:5544"
}

Field	Required?	Meaning
name	Y	The name of the new target
summary	Y	A short description of the target
plugin	Y	The name of the plugin to use when backing up this target
endpoint	Y	The endpoint configuration required to access this target's data
agent	Y	The host:port of a Shield agent that can backup/resetore this target

NOTE: summary is required for update requests, whereas it is optional on creation.

Response Body:

{
  "ok" : "updated"
}

Field	Meaning
ok	The target was updated

Stores API

Purpose: allows operators (via the Web UI and CLI components) to view what storage systems are available for configuring backups, provision new ones, update existing ones and delete unused ones.

Method	Path	Arguments	Request Body
GET	/v1/stores	?plugin=:name ?unused=[tf]	-
POST	/v1/stores	-	see below
DELETE	/v1/store/:uuid	-	-
GET	/v1/store/:uuid	-	-
PUT	/v1/store/:uuid	-	see below

GET /v1/stores

[
  {
    "uuid"     : "5bcde12a-8b3f-4663-bbe3-9fe0fd6a093d",
    "name"     : "Store Name",
    "summary"  : "a short description",
    "plugin"   : "plugin-name",
    "endpoint" : "{\"encoded\":\"json\"}"
  },

  "..."
]

POST /v1/stores

Request Body:

{
  "name"     : "Store Name",
  "summary"  : "a short description",
  "plugin"   : "plugin-name",
  "endpoint" : "{\"encoded\":\"json\"}"
}

Field	Required?	Meaning
name	Y	The name of the new store
summary	N	A short description of the store
plugin	Y	The name of the plugin to use when backing up this store
endpoint	Y	The endpoint configuration required to access this store's data

Response Body:

{
  "ok"   : "created",
  "uuid" : "6b8398be-fdc0-424a-8532-e812e5dfc116"
}

Field	Meaning
ok	The new store was created
uuid	The UUID of the newly-created store

PUT /v1/store/:uuid

Request Body:

{
  "name"     : "Store Name",
  "summary"  : "a short description",
  "plugin"   : "plugin-name",
  "endpoint" : "{\"encoded\":\"json\"}"
}

Field	Required?	Meaning
name	Y	The name of the new store
summary	Y	A short description of the store
plugin	Y	The name of the plugin to use when backing up this store
endpoint	Y	The endpoint configuration required to access this store's data

NOTE: summary is required for update requests, whereas it is optional on creation.

Response Body:

{
  "ok" : "updated"
}

Field	Meaning
ok	The store was updated

Jobs API

Purpose: allows end-users and operators to see what jobs have been configured, and the details of those configurations. The filtering on the main listing / search endpoint (/v1/jobs) allows the frontends to show only jobs for specific schedules (what weekly backups are we running?), retention policies (what backups are we keeping for 90d or more?), and specific targets / stores.

Method	Path	Arguments	Request Body
GET	/v1/jobs	?target=:uuid ?store=:uuid ?schedule=:uuid ?retention=:uuid ?paused=[tf]	-
POST	/v1/jobs	-	see below
DELETE	/v1/job/:uuid	-	-
GET	/v1/job/:uuid	-	-
PUT	/v1/job/:uuid	-	see below
POST	/v1/job/:uuid/pause	-	-
POST	/v1/job/:uuid/unpause	-	-
POST	/v1/job/:uuid/run	-	see below

GET /v1/jobs

[
  {
    "uuid"            : "af0b40b2-8f7b-46e4-b425-9730c677e625",
    "name"            : "A Backup Job",
    "summary"         : "a short description",

    "retention_name"  : "100d Retention Policy",
    "retention_uuid"  : "7eb2131c-c2ad-40b1-916f-7e162be89465",
    "expiry"          : 8640000,

    "schedule_name"   : "Daily Backups Schedule",
    "schedule_uuid"   : "e390934b-fc43-4343-a51b-22bd69a8894f",
    "schedule"        : "daily at 4am",

    "paused"          : false,

    "store_uuid"      : "994e991f-112d-496d-a1df-bbdc67c79332",
    "store_plugin"    : "store-plugin",
    "store_endpoint"  : "{\"encoded\":\"json\"}",

    "target_uuid"     : "443e2ce1-de2e-4369-a497-add3dd970d4d",
    "target_plugin"   : "target-plugin",
    "target_endpoint" : "{\"encoded\":\"json\"}"
  },

  "..."
]

POST /v1/jobs

Request Body:

{
  "name"      : "Job Name",
  "summary"   : "a short description",

  "store"     : "uuid-of-store-to-use",
  "target"    : "uuid-of-target-to-use",
  "retention" : "uuid-of-retention-policy-to-use",
  "schedule"  : "uuid-of-schedule-to-use",

  "paused"    : false
}

Field	Required?	Meaning
name	Y	The name of the new job
summary	N	A short description of the job
store	Y	The UUID of the store to back data up to
target	Y	The UUID of the target to back up
retention	Y	The UUID of the retention policy to apply to backup archives
schedule	Y	The UUID of the backup schedule to use when determining when this job should run
paused	Y	Whether or not this job should be paused, initially

Response Body:

{
  "ok"   : "created",
  "uuid" : "6b8398be-fdc0-424a-8532-e812e5dfc116"
}

Field	Meaning
ok	The new job was created
uuid	The UUID of the newly-created job

GET /v1/job/:uuid

{
  "uuid"            : "af0b40b2-8f7b-46e4-b425-9730c677e625",
  "name"            : "A Backup Job",
  "summary"         : "a short description",

  "retention_name"  : "100d Retention Policy",
  "retention_uuid"  : "7eb2131c-c2ad-40b1-916f-7e162be89465",
  "expiry"          : 8640000,

  "schedule_name"   : "Daily Backups Schedule",
  "schedule_uuid"   : "e390934b-fc43-4343-a51b-22bd69a8894f",
  "schedule"        : "daily at 4am",

  "paused"          : false,

  "store_uuid"      : "994e991f-112d-496d-a1df-bbdc67c79332",
  "store_plugin"    : "store-plugin",
  "store_endpoint"  : "{\"encoded\":\"json\"}",

  "target_uuid"     : "443e2ce1-de2e-4369-a497-add3dd970d4d",
  "target_plugin"   : "target-plugin",
  "target_endpoint" : "{\"encoded\":\"json\"}"
}

PUT /v1/job/:uuid

Request Body:

{
  "name"      : "Job Name",
  "summary"   : "a short description",

  "store"     : "uuid-of-store-to-use",
  "target"    : "uuid-of-target-to-use",
  "retention" : "uuid-of-retention-policy-to-use",
  "schedule"  : "uuid-of-schedule-to-use"
}

Field	Required?	Meaning
name	Y	The name of the new job
summary	Y	A short description of the job
store	Y	The UUID of the store to back data up to
target	Y	The UUID of the target to back up
retention	Y	The UUID of the retention policy to apply to backup archives
schedule	Y	The UUID of the backup schedule to use when determining when this job should run

NOTE: summary is required for update requests, whereas it is optional on creation.

ALSO NOTE: The paused boolean parameter available on creation is not available for jobs that already exist. Use the other POST URLs for pausing / unpausing existent jobs.

Response Body:

{
  "ok" : "updated"
}

Field	Meaning
ok	The job was updated

POST /v1/job/:uuid/run

Request Body:

{
  "owner" : "Username"
}

Field	Required?	Meaning
owner	N	Name of the user requesting the job re-run; defaults to "anon"

Response Body:

{
  "ok" : "scheduled"
}

Field	Meaning
ok	The task was scheduled

Archive API

Purpose: allows end-users and operators to see what backups have been performed, optionally filtering them to specific targets (just the Cloud Foundry postgres database please), stores (what’s in S3?) and time windows (only show me backups before that data corruption incident). It also facilitates restoration of data, and purging of backups ahead of schedule.

Note: the PUT /v1/archive/:uuid endpoint is only able to update the annotations (name and summary) for an archive.

Method	Path	Arguments	Request Body
GET	/v1/archives	?target=:uuid ?store=:uuid ?after=YYYYMMDD ?before=YYYYMMDD	-
POST	/v1/archive/:uuid/restore	{ target: $target_uuid }	see below
DELETE	/v1/archive/:uuid	-	-
GET	/v1/archive/:uuid	-	-
PUT	/v1/archive/:uuid	-	see below

GET /v1/archives

[
  {
    "uuid"            : "9ee4b579-19ba-4fa5-94e1-e5b2a4d8e85a",
    "store_key"       : "BKP-1234-56789",

    "taken_at"        : "2015-10-25 11:32:00",
    "expires_at"      : "2015-12-25 11:32:00",
    "notes"           : "a few notes about this archive",

    "store_uuid"      : "b7b5743f-adfa-4ceb-abde-2c2085149b12",
    "store_plugin"    : "store-plugin",
    "store_endpoint"  : "{\"encoded\":\"json\"}",

    "target_uuid"     : "5c7b8b50-ff11-4d67-9624-fd8214bc8629",
    "target_plugin"   : "target-plugin",
    "target_endpoint" : "{\"encoded\":\"json\"}"
  },

  "..."
]

GET /v1/archive/:uuid

not yet implemented, apparently

POST /v1/archive/:uuid/restore

Request Body

{
  "target" : "dd322f14-763d-4659-bc49-c2f1f2352341",
  "owner"  : "Username"
}

| Field | Required? | Meaning |
| :---- | :-------: | :------ |
| target | N | UUID of the target to restore this archive to.  Defaults to the target from the original backup job
| owner | N | Username of the user requesting the restoration.  Defaults to "anon"

Response Body:

```json
{
  "ok" : "scheduled"
}

Field	Meaning
ok	The restore task was scheduled

PUT /v1/archive/:uuid

Request Body:

{
  "notes" : "Some notes about this archive"
}

Field	Required?	Meaning
notes	Y	Notes about the archive

Response Body:

{
  "ok" : "updated"
}

Field	Meaning
ok	The archive was updated

Tasks API

Purpose: allows the Web UI and the CLI to show running tasks, query a specific task, submit new tasks, cancel tasks, etc.

Method	Path	Arguments	Request Body
GET	/v1/tasks	?status=:status ?debug	-
GET		/v1/task/:uuid	-
DELETE	/v1/task/:uuid	-	-

GET /v1/tasks

[
  {
    "uuid"         : "5e2c416d-36f7-484a-8a2a-3d3d567d55d6",
    "owner"        : "system",
    "type"         : "backup",

    "job_uuid"     : "274ddd91-6c17-4e5a-b5cd-6d53925d48b4",
    "archive_uuid" : "286102fe-c0fd-4e45-a357-743436a19602",
    "status"       : "done",
    "started_at"   : "2015-11-25 11:30:00",
    "stopped_at"   : "2015-11-25 11:32:00",
    "log"          : "this is the log of the job"
  },

  "..."
]

Meta API

Purpose: provides public (non-sensitive) information about the Shield daemon.

Method	Path	Arguments	Request Body
GET	/v1/meta/pubkey	-	-

GET /v1/meta/pubkey

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5X75B52xHxfDeujUiKNk9t2jZTR6FIb02t9pUcE6yfwItKGEM8wEad5TVtAqrqdiOaZoosYzcXzzcM2JXsGaCqhVyf2oNaQHiPuyLufPdPW3ZE6omKfHlwL32PkdK4XtZQIwwLEK4NScp1Gvi8GMF90JSaPOQuKgpXCiDXQWFuQkPUzu6yIQIkhPCthtLRn31Td/zF92vBdr5VXyjQ1j8lFTO0jrw9nqwnrW3SA6b1FToSaLvXJJvV8De1Vlkl030tzVdYA4KPIZFX7IPPueVBJcqCaXxEMSzceknGTXP7r64oJDJw4vE39pYqCYtllhzOKKYVaDTHoUUBsZQu+e5 core@shield

This can be used by agents to auto-authorize the core daemon for remote operations, rather than having to specify the key out-of-band. There are security risks involved in using this feature, so be consider the potential for MitM attacks and act accordingly.

Plugin Calling Protocol

Store and Target Plugins are implemented as external programs, either scripts or compiled binaries, that follow the Plugin Calling Protocol, which stipulates how file descriptors are to be used, and what arguments are going to be passed to the external program to perform what functions.

$ redis-plugin info
{
  "name": "My Redis Plugin",
  "author": "Joe Random Hacker",
  "version": "1.0.0",
  "features": {
    "target": "yes",
    "store": "no"
  }
}

$ s3-plugin info
{
  "name": "My S3 Storage Plugin",
  "author": "Joe Random Hacker",
  "version": "2.1.4",
  "features": {
    "target": "no",
    "store": "yes"
  }
}

$ redis-plugin backup --endpoint '{"username":"redis","password":"secret"}' | s3-plugin store --endpoint '{"bucket":"test","key":"AKI123098123091"}'
{
  "key": "BA670360-DE9D-46D0-AEAB-55E72BD416C4"
}

$ s3-plugin retrieve --key decaf-bad --endpoint '{"bucket":"test","key":"AKI123098123091"}' | redis-plugin restore --endpoint '{"username":"redis","password":"secret"}'

Each plugin program must implement the following actions, which will be passed as the first argument:

info - Dump a JSON-encoded map containing the following keys, to standard output:
1. name - The name of the plugin (human-readable)
2. author - The name of the person or team who maintains the plugin. May include email, at author discretion.
3. version - The version of the plugin
4. features - A map of the features of this plugin. Currently supports two boolean keys ("yes" for true, "no" for false, both lower case) named "target" and "store", that indicate whether or not the plugin can support target and/or store operations.
Other keys are allowed, but ignored, and all keys are reserved for future expansion. Keys starting with an underscore ('_') will never be used by shield, and is free for your own use.

Always exits 0 to signify success. Exits non-zero to signify an error, and prints diagnostic information to standard error.
backup - Stream a backup blob of arbitrary binary data (per plugin semantics) to standard output, based on the endpoint given via the --endpoint command line argument. For example, a database target plugin may require the DSN and username/password in a JSON structure, and will run a platform-specific backup tool, hooking its output to standard output (like pgdump or mysqldump).

Error messages and diagnostics should be printed to standard error.

Exits 0 on success, or non-zero on failure.
restore - Read a backup blob of arbitrary binary data (per plugin semantics) from standard input, and perform a restore based on the endpoint given via the --endpoint command line argument.

Error messages and diagnostics should be printed to standard error.

Exits 0 on success, or non-zero on failure.
store - Read a backup blob of arbitrary binary data from standard input, and store it in the remote storage system, based on the endpoint given via the --endpoint command line argument. For example, an S3 plugin might require keys and a bucket name to perform storage operations.

Error messages and diagnostics should be printed to standard error.

Exits 0 on success, or non-zero on failure.

On success, write the JSON representation of a map containing a summary of the stored object, including the following keys:
1. key - An opaque identifier that means something to the plugin for purposes of restore. This will be logged in the database by shield.
Other keys are allowed, but ignored, and all keys are reserved for future expansion. Keys starting with an underscore ('_') will never be used by shield, and is free for your own use.
retrieve Stream a backup blob of arbitrary binary data to standard output, based on the endpoint configuration given in the --endpoint command line argument, and a key, as given by the --key command line argument. (This will be the key that was returned from the store operation)

Error messages and diagnostics should be printed to standard error.

Exits 0 on success, or non-zero on failure.
purge Remove a backup blob of arbitrary data from the remote storage system, based on the endpoint configuration given in the --endpoint command line argument. The blob to be removed is identified via the --key command line argument.

Error messages and diagnostics should be printed to standard error.

Exits 0 on success, or non-zero on failure.

Notes on Development

Setting the environment variable SHIELD_MODE to the value DEV will cause all scheduling information to revert to "every minute" regardless of the actual schedule. This is to assist developers.

The Makefile

The Makefile is used to assist with development. The available targets are:

test | tests : runs all the tests with no additional parameters
coverage : runs tests with coverage information
report : makes report in (temporary) HTML page for a particular package, e.g. db. See examples.
race : runs ginkgo -race * to test for race conditions
plugin | plugins : builds all the plugin binaries
shield : builds the shieldd and shield-schema binaries
all-the-things : runs all the tests (except the race test) and builds all the binaries.
fixme | fixmes : finds all FIXMEs in the project

all-the-things is also the default behavior, so running make with no targets is the same as make all-the-things.

Examples:

$ make shield
go build ./cmd/shieldd
go build ./cmd/shield-agent
go build ./cmd/shield-schema

$ make tests
ginkgo *
[1447777189] Agent Test Suite - 39/39 specs •••••••••••••••••••••••••••••••••••••• SUCCESS! 354.651397ms PASS
[1447777189] Database Layer Test Suite - 21/21 specs ••••••••••••••••••••• SUCCESS! 2.115501107s PASS
[1447777189] Plugin Framework Test Suite - 45/45 specs ••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 19.791121ms PASS
[1447777189] Supervisor Test Suite - 121/121 specs ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 135.689325ms PASS
[1447777189] Timespec Test Suite - 34/34 specs •••••••••••••••••••••••••••••••••• SUCCESS! 20.880477ms PASS

Ginkgo ran 5 suites in 5.373477117s
Test Suite Passed
go vet ./...

$ make report FOR=db
go tool cover -html=coverage/db.cov

CLI Usage Examples

This section is exploratory.

# targets
$ shield create target
$ shield list targets [--[un]used] [--plugin $NAME]
$ shield show target $UUID
$ shield edit target $UUID
$ shield delete target $UUID

# schedule management
$ shield list schedules [--[un]used]
$ shield show schedule $UUID
$ shield delete schedule $UUID
$ shield update schedule $UUID

# retention policies
$ shield list retention policies [--[un]used]
$ shield show retention policy $UUID
$ shield delete retention policy $UUID
$ shield update retention policy $UUID

# "managing" plugins
$ shield list plugins
$ shield show plugin $NAME

# stores
$ shield list stores [--[un]used] [--plugin $NAME]
$ shield show store $UUID
$ shield edit store $UUID
$ shield delete store $UUID

# jobs
$ shield list jobs [--[un]paused] [--target $UUID] [--store $UUID]
                [--schedule $UUID] [--retention-policy $UUID]
$ shield show job $UUID
$ shield pause job $UUID
$ shield unpause job $UUID
$ shield paused job $UUID
$ shield run job $UUID
$ shield edit job $UUID
$ shield delete job $UUID

# archives
$ shield list archives [--target $UUID] [--store $UUID]
                    [--after YYYYMMDD] [--before YYYYMMDD]
$ shield show archive $UUID
$ shield edit archive $UUID
$ shield delete archive $UUID
$ shield restore archive $UUID [--to $TARGET_UUID]

# task management
$ shield list tasks [--all]
$ shield show task $UUID
$ shield cancel task $UUID

Proof of Concept (Where Do We Go From Here?)

Research

We need to identify all of the data systems we wish to support with this system. For each system, we need to identify any problematic systems that will not fit into one of the two collection / restore models designed:

Direct over-the-network backup/restore a la pg_dump / pg_restore
Instrumentation of local backup/restore + file shipping via Agent Daemon / Plugin

Stage 1 Proof-of-Concept

To get this project off the ground, I think we need to do some research and experimental implementation into the following areas:

Implement the postgres target plugin using pg_dump / pg_restore tools
Implement the fs storage plugin to store blobs in the local file system
Implement the Core Daemon with limited functionality:
- Task execution
- backup operation
- restore operation
Implement the HTTP API with limited functionality:
- /v1/jobs/*
- /v1/archive/*
Implement the CLI with limited functionality:
- shield * job
- shield * backup
- shield * task

This will let us test flush out any inconsistencies in the architecture, and find any problematic aspects of the problem domain not presently considered.

Stage 2 Proof-of-Concept

Next, we extend the proof-of-concept implementation to test out the Agent Target Plugin design, using Redis as the data system. This entails the following:

Implement the Agent Daemon (in general)
Extend the Agent Daemon to handle Redis’ BGSAVE command
Implement the Agent Target Plugin

License

shieldproject/shield

Folders and files

Latest commit

History

Repository files navigation

S.H.I.E.L.D. Backup Solution

Project Goal

Architecture

Target Plugins

backup

restore

Storage Plugins

store

retrieve

purge

Core Daemon

Metadata Management

Scheduling Backups

Expiring Backups

Ad hoc Backups

Restores

Monitoring

HTTP API

Catalog Database

HTTP API

Schedules API

GET /v1/schedules

POST /v1/schedules

PUT /v1/schedule/:uuid

Retention Policies API

GET /v1/retention

POST /v1/retention

PUT /v1/retention/:uuid

Targets API

GET /v1/targets

POST /v1/targets

PUT /v1/target/:uuid

Stores API

GET /v1/stores

POST /v1/stores

PUT /v1/store/:uuid

Jobs API

GET /v1/jobs

POST /v1/jobs

GET /v1/job/:uuid

PUT /v1/job/:uuid

POST /v1/job/:uuid/run

Archive API

GET /v1/archives

GET /v1/archive/:uuid

POST /v1/archive/:uuid/restore

PUT /v1/archive/:uuid

Tasks API

GET /v1/tasks

Meta API

GET /v1/meta/pubkey

Plugin Calling Protocol

Notes on Development

The Makefile

CLI Usage Examples

Proof of Concept (Where Do We Go From Here?)

Research

Stage 1 Proof-of-Concept

Stage 2 Proof-of-Concept

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 71

Packages 0

Contributors 48

Languages

Packages