Skip to content

Commit

Permalink
Huge circuit streaming (quantumlib#15)
Browse files Browse the repository at this point in the history
- Add main.perf.cc checking overall performance of simulators
- Manage memory for jagged data directly instead of delegating to std::vector, removing a pointer indirection
    - Add MonotonicBuffer<T>
    - Add PointerRange<T>
    - Remove JaggedDataArena<T>
    - Remove VectorView<T>
- Polish development documentation
    - Split `dev/README.md` out of `README.md`
    - Reference `dev/README.md` and `glue/python/README.md` from main readme (instead of repeating their contents)
    - Include `doctest` test commands in testing instructions
- Tweak text representations
    - Drop the `# Circuit` context line from `circuit.str()`
    - Switch from having `__str__` to having `__repr__` for compiled python binding samplers
    - Give empty circuits a simpler repr
- Refactor Circuit to support streaming
    - Add `blocks` property for storing sub-circuits.
    - Tweak `REPEAT` instructions to reference a block instead of flattening them into raw instructions.
    - Add `Circuit::max_lookback()` method for determining when measurement results can be discarded.
    - Simplify invariants by refactoring `num_qubits` and `num_measurements` fields into `count_qubits` and `count_measurements` methods.
    - Add `Circuit::count_detectors_and_observables`.
    - `Circuit::operator*` now wraps the circuit's body into a `REPEAT` block, instead of flat repeating
    - Add `Circuit::for_each_operation[_reverse]` for iterating over operations while recursing into `REPEAT` blocks
- When the number of simulated measurements is huge, switch to streaming results instead of holding everything in memory
    - Add `#define SWITCH_TO_STREAMING_MEASUREMENT_THRESHOLD 100000000`
    - Ensure this works for `--sample` (tableau), `--sample=1000` (frame), and `--detect=1000` (detectors).
    - Allow `prepend_observable` to continue existing despite it preventing streaming from occurring
    - Refactor `FrameSimulator` to work with a `max_lookback` instead of a `num_measurements`.
    - Extract `MeasureRecord` from `TableauSimulator`
    - Extract `MeasureRecordBatch` from `FrameSimulator`
    - Add `MeasureRecordWriter` and `MeasureRecordWriterBatch` classes for consuming results
- Add mask assignment operators `&=` and `|=` to `simd_bits`
- Loosen version of cirq required by stimcirq
  • Loading branch information
Strilanc authored Mar 18, 2021
1 parent 2e6735e commit 7f32e72
Show file tree
Hide file tree
Showing 68 changed files with 3,548 additions and 1,339 deletions.
10 changes: 10 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ set(SOURCE_FILES_NO_MAIN
src/simulators/detection_simulator.cc
src/simulators/error_fuser.cc
src/simulators/frame_simulator.cc
src/simulators/measure_record_batch.cc
src/simulators/measure_record_batch_writer.cc
src/simulators/measure_record.cc
src/simulators/measure_record_writer.cc
src/simulators/tableau_simulator.cc
src/simulators/vector_simulator.cc
src/stabilizers/pauli_string.cc
Expand All @@ -60,6 +64,7 @@ set(TEST_FILES
src/main_helper.test.cc
src/probability_util.test.cc
src/simd/bit_ref.test.cc
src/simd/monotonic_buffer.test.cc
src/simd/simd_bit_table.test.cc
src/simd/simd_bits.test.cc
src/simd/simd_bits_range_ref.test.cc
Expand All @@ -69,6 +74,10 @@ set(TEST_FILES
src/simulators/detection_simulator.test.cc
src/simulators/error_fuser.test.cc
src/simulators/frame_simulator.test.cc
src/simulators/measure_record.test.cc
src/simulators/measure_record_batch.test.cc
src/simulators/measure_record_batch_writer.test.cc
src/simulators/measure_record_writer.test.cc
src/simulators/tableau_simulator.test.cc
src/simulators/vector_simulator.test.cc
src/stabilizers/pauli_string.test.cc
Expand All @@ -80,6 +89,7 @@ set(BENCHMARK_FILES
src/benchmark_util.perf.cc
src/circuit/circuit.perf.cc
src/circuit/gate_data.perf.cc
src/main.perf.cc
src/probability_util.perf.cc
src/simd/simd_bit_table.perf.cc
src/simd/simd_bits.perf.cc
Expand Down
231 changes: 6 additions & 225 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Stim

Stim is a fast simulator for non-adaptive quantum stabilizer circuits.
Stim is a fast simulator for quantum stabilizer circuits.
Stim is based on the stabilizer tableau representation introduced in
[Scott Aaronson et al's CHP simulator](https://arxiv.org/abs/quant-ph/0406196).
Stim makes three key improvements over CHP.
Expand Down Expand Up @@ -28,90 +28,13 @@ Pauli string multiplication is a key bottleneck operation when updating a stabil
Tracking Pauli frames can also benefit from vectorization, by combining them into batches and computing thousands of
samples at a time.

# Usage (python)

Stim can be installed into a python 3 environment using pip:

```bash
pip install stim
```

Once stim is installed, you can `import stim` and use it.
There are two supported use cases: interactive usage and high speed sampling.

You can use the Tableau simulator in an interactive fashion:

```python
import stim

s = stim.TableauSimulator()

# Create a GHZ state.
s.h(0)
s.cnot(0, 1)
s.cnot(0, 2)

# Measure the GHZ state.
print(s.measure_many(0, 1, 2)) # [False, False, False] or [True, True, True]
```

Alternatively, you can compile a circuit and then begin generating samples from it:

```python
import stim

# Create a circuit that measures a large GHZ state.
c = stim.Circuit()
c.append_operation("H", [0])
for k in range(1, 30):
c.append_operation("CNOT", [0, k])
c.append_operation("M", range(30))

# Compile the circuit into a high performance sampler.
sampler = c.compile_sampler()

# Collect a batch of samples.
# Note: the ideal batch size, in terms of speed per sample, is roughly 1024.
# Smaller batches are slower because they are not sufficiently vectorized.
# Bigger batches are slower because they use more memory.
batch = sampler.sample(1024)
print(type(batch)) # numpy.ndarray
print(batch.dtype) # numpy.uint8
print(batch.shape) # (1024, 30)
print(batch)
# Prints something like:
# [[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
# ...
# [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
# [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
```
# Building

The circuit can also include noise:

```python
import stim
import numpy as np

c = stim.Circuit("""
X_ERROR(0.1) 0
Y_ERROR(0.2) 1
Z_ERROR(0.3) 2
DEPOLARIZE1(0.4) 3
DEPOLARIZE2(0.5) 4 5
M 0 1 2 3 4 5
""")
batch = c.compile_sampler().sample(2**20)
print(np.mean(batch, axis=0).round(3))
# Prints something like:
# [0.1 0.2 0. 0.267 0.267 0.266]
```
See the [developer documentation](dev/README.md).

You can also sample detection events using `stim.Circuit.compile_detector_sampler`.
# Usage (python)

See the [python documentation](glue/python/README.md).

# Usage (command line)

Expand Down Expand Up @@ -281,7 +204,7 @@ Only one mode can be specified.
Detection event sampling mode.
Outputs whether or not measurement sets specified by `DETECTOR` instructions have been flipped by noise.
Assumes (does not verify) that all `DETECTOR` instructions corresponding to measurement sets with deterministic parity.
See also `--prepend_observables`, `--append_observables`.
See also `--append_observables`.
If an integer argument is specified, run that many shots of the circuit.
- `--detector_hypergraph`:
Detector graph creation mode.
Expand All @@ -307,11 +230,6 @@ Not all modifiers apply to all modes.
In addition to outputting the values of detectors, output the values of logical observables
built up using `OBSERVABLE_INCLUDE` instructions.
Put these observables' values into the detection event output as if they were additional detectors at the end of the circuit.
- `--prepend_observables`:
Requires detection event sampling mode.
In addition to outputting the values of detectors, output the values of logical observables
built up using `OBSERVABLE_INCLUDE` instructions.
Put these observables' values into the detection event output as if they were additional detectors at the start of the circuit
- `--in=FILEPATH`:
Specifies a file to read a circuit from.
If not specified, the `stdin` pipe is used.
Expand Down Expand Up @@ -579,140 +497,3 @@ Not all modifiers apply to all modes.
- `TICK`: Optional command indicating the end of a layer of gates.
May be ignored, may force processing of internally queued operations and flushing of queued measurement results.
- `REPEAT N { ... }`: Repeats the instructions in its body N times.
# Building
### CMake Build
```bash
cmake .
make stim
# ./out/stim
```

To control the vectorization (e.g. this is done for testing),
use `cmake . -DSIMD_WIDTH=256` (implying `-mavx2`)
or `cmake . -DSIMD_WIDTH=128` (implying `-msse2`)
or `cmake . -DSIMD_WIDTH=64` (implying no machine architecture flag).
If `SIMD_WIDTH` is not specified, `-march=native` is used.

### Bazel Build

```bash
bazel build stim
# bazel run stim
```

### Manual Build

```bash
find src | grep "\\.cc" | grep -v "\\.\(test\|perf\|pybind\)\\.cc" | xargs g++ -pthread -std=c++11 -O3 -march=native
# ./a.out
```

### Python Package Build

Environment requirements:

```bash
pip install -y pybind11 cibuildwheel
```

Build source distribution (fallback for missing binary wheels):

```bash
python setup.py sdist
```

Output in `dist` directory.

Build manylinux binary distributions (takes 30+ minutes):

```bash
python -m cibuildwheel --output-dir wheelhouse --platform=linux
```

Output in `wheelhouse` directory.

Build `stimcirq` package:

```bash
cd glue/cirq
python setup.py sdist
```

Output in `glue/cirq/dist` directory.

# Testing

### Run tests using CMAKE

Unit testing with CMAKE requires GTest to be installed on your system and discoverable by CMake.
Follow the ["Standalone CMake Project" from the GTest README](https://github.com/google/googletest/tree/master/googletest).

Run tests with address and memory sanitization, but without optimizations:

```bash
cmake .
make stim_test
./out/stim_test
```

To force AVX vectorization, SSE vectorization, or no vectorization
pass `-DSIMD_WIDTH=256` or `-DSIMD_WIDTH=128` or -DSIMD_WIDTH=64` to the `cmake` command.

Run tests with optimizations without sanitization:

```bash
cmake .
make stim_test_o3
./out/stim_test_o3
```

### Run tests using Bazel

Run tests with whatever settings Bazel feels like using:

```bash
bazel :stim_test
```

### Run python binding tests

In a fresh virtual environment:

```bash
pip install -e .
pip install -y numpy pytest
python -m pytest src
```

# Benchmarking

```bash
cmake .
make stim_benchmark
./out/stim_benchmark
```

This will output results like:

```
[....................*....................] 460 ns (vs 450 ns) ( 21 GBits/s) simd_bits_randomize_10K
[...................*|....................] 24 ns (vs 20 ns) (400 GBits/s) simd_bits_xor_10K
[....................|>>>>*...............] 3.6 ns (vs 4.0 ns) (270 GBits/s) simd_bits_not_zero_100K
[....................*....................] 5.8 ms (vs 6.0 ms) ( 17 GBits/s) simd_bit_table_inplace_square_transpose_diam10K
[...............*<<<<|....................] 8.1 ms (vs 5.0 ms) ( 12 GOpQubits/s) FrameSimulator_depolarize1_100Kqubits_1Ksamples_per1000
[....................*....................] 5.3 ms (vs 5.0 ms) ( 18 GOpQubits/s) FrameSimulator_depolarize2_100Kqubits_1Ksamples_per1000
```

The bars on the left show how fast each task is running compared to baseline expectations (on my dev machine).
Each tick away from the center `|` is 1 decibel slower or faster (i.e. each `<` or `>` represents a factor of `1.26`).

Basically, if you see `[......*<<<<<<<<<<<<<|....................]` then something is *seriously* wrong, because the
code is running 25x slower than expected.

The benchmark binary supports a `--only=BENCHMARK_NAME` filter flag.
Multiple filters can be specified by separating them with commas `--only=A,B`.
Ending a filter with a `*` turns it into a prefix filter `--only=sim_*`.
Loading

0 comments on commit 7f32e72

Please sign in to comment.