Skip to content
This repository has been archived by the owner on Jun 29, 2023. It is now read-only.

Commit

Permalink
Update backend developer docs
Browse files Browse the repository at this point in the history
  • Loading branch information
igboyes committed May 2, 2022
1 parent a06a448 commit 8f8a8c2
Show file tree
Hide file tree
Showing 4 changed files with 70 additions and 200 deletions.
9 changes: 0 additions & 9 deletions content/docs/developer/backend/building.md

This file was deleted.

165 changes: 5 additions & 160 deletions content/docs/developer/backend/guide.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,16 @@
---
title: "Backend Guide"
title: "Guide"
menu:
developer:
parent: "Backend"
weight: 10
---

The Virtool backend is written in Python and requires Python 3.6.4 or later to run.

The backend provides the following functionalities:

# Organization

The Virtool server source code is organized by concern into sub-packages.

For example, code related to samples is located at `virtool/samples` and its modules are imported like `import virtool.samples.db` or `import virtool.samples.api`. These sub-packages usually contain three core module files:

| Name | Purpose |
| ------------ | ------------------------------------------------------------------------------------------------ |
| `api.py` | Definition of API handler functions and creation of `RouteDefTable` |
| `db.py` | Definition of database-related module attributes and functions that perform database operations |
| `utils.py` | Utility functions, classes, and constants related to the concern |
| `migrate.py` | Functions for applying general "schema" updates to the documents associated with the sub-package |

More files will be present in complex sub-packages.

# Dependencies

This table describes key backend dependencies and their use in Virtool.

| Package | Use |
| ------------------ | ---------------------------------------------------------------------------------------------------- |
| `aiofiles` | asynchronous read and writing of files on filesystem |
| `aiohttp` | asynchronous HTTP server |
| `aiojobs` | long-running background asynchronous jobs that integrate with the `aiohttp` lifecycle |
| `aionotify` | asynchronous watching for inotify filesystem events on the Linux operating system |
| `arrow` | date-time library with improved ease-of-use compared to standard `datetime` library |
| `bcrypt` | cryptographic library for hashing and salting user passwords |
| `Cerberus` | checks `dict` objects against defined schemas; used for validating user input |
| `cx-Freeze` | used for bundling Virtool server application into executable; will be deprecated in favour of Docker |
| `dictdiffer` | utilities for diffing dicts; used for keeping incremental history of OTUs |
| `Mako` | HTML templating for non-React portions of application |
| `motor` | asynchronous MongoDB driver |
| `pytest` | Python testing library |
| `pytest-aiohttp` | pytest plugin for testing `aiohttp` applications |
| `pytest-cov` | plugin for using `coverage` with `pytest`; generates test coverage reports |
| `pytest-mock` | pytest plugin for using `Unitest.Mock` with pytest |
| `raven` | client library for [Sentry.io](https://sentry.io) |
| `raven-aiohttp` | make asynchronous API requests to [Sentry.io](https://sentry.io) using `aiohttp` |
| `uvloop` | fast drop-in replacement for standard `asyncio` event loop |
| `visvalingamwyatt` | library for smoothing coverage curves |
The Virtool backend is written in Python and requires Python 3.8 or later to run.

# Startup

A number of services are setup during the `aiohttp` start-up sequence. See `virtool.app` for the revelant code.
A number of services are setup during the `aiohttp` start-up sequence.

## HTTP Client

Expand All @@ -70,22 +27,6 @@ await virtool.app.init_http_client(app)
app["client"]

```

## Refreshers

This term refers to long-running async tasks that refresh Virtool data based on an internet resource. Currently
these tasks update the following:

- HMM data releases
- remote references
- software releases

These tasks are run as `aiojobs` jobs. They are initialized on startup and cancelled on shutdown.

## Version

The backend version is detected either using `git` to find the current tag name or finding a local `VERSION` file.

## Executors

Read about [executing code in thread or process pools](https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools).
Expand Down Expand Up @@ -143,16 +84,8 @@ operations a list of removed document IDs is sent.
| update | an existing document in the collection was updated |
| delete | an existing document(s) was removed from the collection |

```python3
# Initializing the dispatcher during startup.
await virtool.app.init_dispatcher(app)

# The dispatcher is accessible in the app state. Dispatch a message by calling dispatch().
await app["dispatcher"].dispatch("samples", "update", data)

```

# Database
# MongoDB

Virtool connects to MongoDB using the `motor` asynchronous driver.

Expand Down Expand Up @@ -205,92 +138,4 @@ synchronously that would be better done using an asynchronous library**.
Examples:

- reading and writing files in a thread or process using the built-in `open()` instead of using the `aiofiles` package
- making database calls using `pymongo` or `redis-py` in a thread instead of using native asynchronous drivers like `motor` and `aioredis`

# Jobs

Key properties of jobs:

- long-running
- computationally demanding
- run in separate processes
- closely tracked with progress reported to users
- have configurable host resource limits

## Types of Jobs

Jobs are currently used for the following:

- Creating samples
- Updating legacy samples
- Creating subtractions
- Building reference indexes
- Pathoscope-based analysis for known viruses
- NuVs analysis for novel viruses

## In the database

A job document is created in response to a user action (_eg_. creating a new sample).

| Field | Description |
| ------ | ----------------------------------------------------------------------------------------------------- |
| task | the type of task being performed by the job |
| args | args that will be accessible in the job instance and define what data and parameters the job will use |
| proc | the maximum number of cores/threads the job will use |
| mem | the maximum amount of RAM the job will use |
| user | the user that started the job |
| status | a list of status messages that track the progress of the job |

The status field of the job document is updated as the job starts and proceeds through its steps. These changes are
dispatched to users as they happen.

## In the manager

An instance of `virtool.jobs.manager.IntegratedManager` is created when the application starts. The job manager's `run()` method continuously works to queue, start, and monitor jobs.

A new `virtool.jobs.job.Job` object is created when a user action leads to a call to the manager's `enqueue()` method. Jobs that are waiting to run or running are kept in a `dict` at `IntegratedManager._jobs`. Once jobs finish, either by error, cancellation, or successful completion, their `Job` objects are deleted from `IntegratedManager._jobs`.

The `Job` object is a subclass of `multiprocessing.Process`.

## Cancellation

Jobs can be cancelled by users through the Virtool client or an API request.

On cancellation, the job process is interrupted and the job object's `cleanup()` method is called. The job document is updated with a final cancellation `status` entry. The client will display the jobs in a cancelled state when this database change is dispatched.

## Errors

Jobs can encounter errors when calling external programs (_eg_. bowtie2) or when running Python code used for handling
results or running statistical analysis.

When an error is detected, execution of job steps is interrupted. Then the job's cleanup method is run to remove and partial
files or database documents created by the job.

An error subdocument is added to the `status` field in the job document. This is used in the Virtool client to display
an error message to the user.

## Resources

Administrators can [configure job resource limits](/docs/manual/start/configuration/).

The Virtool instance can have global process and memory limits set on it. Individual job limits are categorized into _large_ and _small_ types. _Large_ jobs currently comprise the analysis workflows that have heavier resource requirements.

The available and used resources are tracked in the job manager. When jobs start the used resources are increased based on the requirements of the job. The reserved resources are then released when the job finishes.

# File Manager

The file manager (`virtool.files.manager.Manager`) deals with files sourced from outside Virtool.

## Files Directory

The directory at `<data_path>/files` is the storage location for externally sourced files. Each file in this location has a corresponding database document in the `files` collection.

The files database collection and the files in `<data_path>/files are automatically kept in sync. If a file is removed from the directory manually, its document will also be removed. Removing a file's database document will result in removal of the file itself.

When initially created, file documents can have an expiration time set. The file document and consequently the file itself will be removed when this time is reached.

## Watching

Virtool supports configuration of a directory from which FASTQ files will be automatically imported.

We use the library [`aionotify`](https://github.com/rbarrois/aionotify) to detect changes in the watch directory. When a file is fully written, a document is created in the `files` collection and the file is copied to the `<data_path>/files` directory.
- making database calls using `pymongo` or `redis-py` in a thread instead of using native asynchronous drivers like `motor` and `aioredis`
22 changes: 16 additions & 6 deletions content/docs/developer/backend/resources.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Backend Resources"
title: "Learning Resources"
menu:
developer:
parent: "Backend"
Expand Down Expand Up @@ -39,16 +39,13 @@ The `aiohttp` client has minimal use in Virtool. It is used for making simple re

## [MongoDB](https://docs.mongodb.com/v3.6/)

Virtool currently uses MongoDB >=3.6.
Virtool currently uses MongoDB >= 4.4.

The official MongoDB documentation includes detailed information about the following:

- [CRUD Operations](https://docs.mongodb.com/v3.6/crud/)
- [Aggregation](https://docs.mongodb.com/v3.6/aggregation/)
- [Indexes](https://docs.mongodb.com/v3.6/indexes/)

If you are working on Virtool-MongoDB security, see the following:

- [Security](https://docs.mongodb.com/v3.6/security/)

### `motor`
Expand All @@ -62,4 +59,17 @@ You have to refer to the [`asyncio`-focussed API documentation for `motor`](http
- [`AsyncIOMotorCollection`](https://motor.readthedocs.io/en/stable/api-asyncio/asyncio_motor_collection.html)
- [`AsyncIOMotorCursor`](https://motor.readthedocs.io/en/stable/api-asyncio/cursors.html)

##
## PostgreSQL

We use [SQLAlchemy](https://www.sqlalchemy.org/) as a Postgres database and object-relation mapping (ORM) library.

**Always use Postgres asyncio support when executing queries**.

## Redis

We use [aioredis==1.3.1](https://aioredis.readthedocs.io/en/v1.3.1/) for connecting to Redis with asyncio support.

We use a limited set of Redis features:

* PubSub for telling jobs to cancel themselves. Job IDs published to `channel:cancel` should be cancelled.
* Redis lists with names like `jobs_pathoscope` and `jobs_create_sample` to queue up job IDs that should be taken up by job runners.
74 changes: 49 additions & 25 deletions content/docs/developer/backend/testing.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Testing the Backend"
title: "Testing"
menu:
developer:
parent: "Backend"
Expand All @@ -8,41 +8,65 @@ menu:

Tests are implemented using the [pytest](https://docs.pytest.org/en/latest/) framework.

Tests can be quickly run by installing all dependencies and executing:
# Running Tests

```bash
## Services

The following services must be running first:

* `postgresql >= 16`
* `mongodb == 4.4`
* `redis == 6.0`

You can start these services using Docker Compose:

1. Ensure `docker` and the `compose` plugin are installed.

2. Clone the Virtool [compose](https://github.com/virtool/compose) respository:

```sh
git clone https://github.com/virtool/compose.git
```

3. Start up containers using the `test` profile:

```sh
docker-compose -p virtool --profile test up -d
```



## Testing

Run tests from the source directory root:

```sh
pytest
```

# API Tests
# Snapshots

As much logic as possible should happen outside of API handler functions. Functions called in API handlers can be mocked. A server instance and backing MongoDB database is created for each test so writing a lot of API tests or creating large test matrices for API handlers can greatly increase testing time.
Snapshots are used for tests were large outputs or API responses are validated.

## Order of funcargs
Snapshots are data files saved to the repository that can be automatically:
* written the first time a test runs
* loaded to validate output in future runs of the same test

For easy readability the order of funcargs passed to test functions follows the order:
We use [syrupy](https://tophat.github.io/syrupy/) for snapshot testing in Python. Be familiar with its API and features.

- values passed in from parametrization
- fixtures from `pytest` itself and plugin libraries
- the `spawn_client` fixture if necessary
- all Virtool fixtures in alphabetical order
## Rules

**_Good_**
1. Never blindly update a snapshot.

```python3
@pytest.mark.parametrize("not_found", [False, True])
async def test_get(not_found, mocker, spawn_client, resp_is, static_time):
client = await spawn_client(authorize=True)
```
Read through every snapshot diff to see why it is failing. You should rarely have a total mismatch between your test output and the stored snapshot.

**_Bad_**
Blindly accepting snapshot updates can lead to insidious bugs that will not be picked up in test runs for other commits.

```python3
@pytest.mark.parametrize("not_found", [False, True])
async def test_get(resp_is, not_found, static_time, spawn_client, mocker):
client = await spawn_client(authorize=True)
```
2. Test and update snapshots as you go.

Whenever you make a change in code. Run the corresponding tests and update the snapshots accordingly. It is no fun making a lot of code changes then attempting to comb through reams of confusing snapshot diffs.

## Error responses
3. Test and update one module or subpackage at a time.

All potential error responses for an API endpoint should be tested.
You can narrow the focus of `pytest` by passing it a path to the tests you want to run. Do this instead of running the entire suite and
trying to update all snapshots.

0 comments on commit 8f8a8c2

Please sign in to comment.