NVIDIA CUDA-Q QEC 0.5.0: Advancing Real-Time Quantum Error Correction with GPU-Accelerated Decoders and AI Inference

quantum computing

NVIDIA CUDA-Q QEC 0.5.0 advances quantum error correction with real-time decoding, GPU-accelerated RelayBP, and AI decoder inference. It introduces sliding window decoding and Pythonic interfaces to reduce latency and enhance operational fault-tolerant quantum computing.

Real-time decoding is a critical component for achieving fault-tolerant quantum computing. By enabling decoders to operate with low latency concurrently with a Quantum Processing Unit (QPU), errors can be corrected within the coherence time, preventing their accumulation and preserving the integrity of quantum results. This can be performed both online with a real quantum device and offline with a simulated quantum processor.

NVIDIA CUDA-Q QEC version 0.5.0 introduces significant advancements to address these challenges and foster research into more effective solutions. Key improvements include support for online real-time decoding, new GPU-accelerated algorithmic decoders, high-performance infrastructure for AI decoder inference, sliding window decoder capabilities, and enhanced Pythonic interfaces.

This post explores these new features, demonstrating how they can accelerate your quantum error correction research or facilitate the operationalization of real-time decoding with your quantum computer.

Real-Time Decoding with CUDA-Q QEC

Real-time decoding is now fully supported within CUDA-Q QEC, implemented through a comprehensive four-stage workflow:

  1. DEM Generation: The initial step involves characterizing device error behavior during operation. A helper function generates the Detector Error Model (DEM) from a quantum code, noise model, and circuit parameters. This DEM maps error mechanisms to syndrome patterns.

    # Step 1: Generate detector error model
    print("Step 1: Generating DEM...")
    cudaq.set_target("stim")
    
    noise = cudaq.NoiseModel()
    noise.add_all_qubit_channel("x", cudaq.Depolarization2(0.01), 1)
    
    dem = qec.z_dem_from_memory_circuit(code, qec.operation.prep0, 3, noise)
    
  2. Decoder Configuration: Users select and configure a decoder, saving its configuration to a YAML file. This file ensures that syndrome measurements are correctly interpreted.

    # Create decoder config
    config = qec.decoder_config()
    config.id = 0
    config.type = "nv-qldpc-decoder"
    config.block_size = dem.detector_error_matrix.shape[1]
    # ... additional configuration parameters ...
    
  3. Decoder Loading and Initialization: Before executing quantum circuits, the YAML file is loaded. CUDA-Q QEC interprets this information, sets up the appropriate decoder implementation, and registers it with the CUDA-Q runtime.

    # Save decoder config
    with open("config.yaml", 'w') as f:
        f.write(config.to_yaml_str(200))
    
  4. Real-time Decoding Execution: With the decoder configured, users can execute quantum circuits. Inside CUDA-Q kernels, the decoding API interacts with the decoders. As logical qubit stabilizers are measured, syndromes are enqueued to the corresponding decoder, which processes them. When corrections are needed, the decoder suggests operations to apply to the logical qubits.

    # Load config and run circuit
    qec.configure_decoders_from_file("config.yaml")
    run_result = cudaq.run(qec_circuit, shots_count=10)
    

GPU-Accelerated RelayBP

RelayBP is a novel algorithmic decoder designed to overcome the limitations of traditional belief propagation (BP) decoders, a common class of quantum low-density parity check (QLDPC) decoders. While BP+OSD (Belief Propagation with Ordered Statistics Decoding) utilizes a GPU-accelerated BP decoder followed by CPU-based Ordered Statistics Post-Processing, this hybrid approach can hinder optimization and parallelization, impacting the low-latency requirements for real-time error decoding.

RelayBP enhances BP methods by incorporating the concept of 'memory strengths' at each graph node. This mechanism controls how much a node retains or discards past messages, effectively dampening or breaking harmful symmetries that can trap BP and prevent convergence. This innovation leads to improved latency and convergence for quantum LDPC codes.

Instantiating a RelayBP decoder is straightforward:

import numpy as np
import cudaq_qec as qec

# Simple 3x7 parity check matrix for demonstration
H_list = [[1, 0, 0, 1, 0, 1, 1], [0, 1, 0, 1, 1, 0, 1],
         [0, 0, 1, 0, 1, 1, 1]]
H = np.array(H_list, dtype=np.uint8)

# Configure relay parameters
srelay_config = {
   'pre_iter': 5,  # Run 5 iterations with gamma0 before relay legs
   'num_sets': 3,  # Use 3 relay legs
   'stopping_criterion': 'FirstConv'  # Stop after first convergence
}

# Create a decoder with Relay-BP
decoder_relay = qec.get_decoder("nv-qldpc-decoder",
                               H,
                               use_sparsity=True,
                               bp_method=3,
                               composition=1,
                               max_iterations=50,
                               gamma0=0.3,
                               gamma_dist=[0.1, 0.5],
                               srelay_config=srelay_config,
                               bp_seed=42)
print("   Created decoder with Relay-BP (gamma_dist, FirstConv stopping)")

# Decode a syndrome
syndrome = np.array([1, 0, 1], dtype=np.uint8)
decoded_result = decoder_relay.decode(syndrome)

AI Decoder Inference

AI decoders are gaining prominence due to their potential to offer superior accuracy or lower latency compared to traditional algorithmic decoders, especially for specific error models. The development process typically involves generating training data, training a model, and exporting it to the ONNX format.

CUDA-Q QEC now features robust infrastructure for integrated AI decoder inference, leveraging NVIDIA TensorRT. This enables low-latency operation of AI decoders from ONNX files, both with offline decoding and an emulated quantum computer.

Example of using a TensorRT decoder:

import cudaq_qec as qec
import numpy as np

# Note: The AI decoder doesn't use the parity check matrix.
# A placeholder matrix is provided here to satisfy the API.
H = np.array([[1, 0, 0, 1, 0, 1, 1],
              [0, 1, 0, 1, 1, 0, 1],
              [0, 0, 1, 0, 1, 1, 1]], dtype=np.uint8)

# Create TensorRT decoder from ONNX model
decoder = qec.get_decoder("trt_decoder", H,
                          onnx_load_path="ai_decoder.onnx")

# Decode a syndrome
syndrome = np.array([1.0, 0.0, 1.0], dtype=np.float32)
result = decoder.decode(syndrome)
print(f"Predicted error: {result}")

To further optimize AI decoder operationalization, CUDA-Q QEC provides recommendations for reducing initialization time by creating pre-built TensorRT engines. With ONNX files supporting various precisions (int8, fp8, fp16, bf16, and tf32), users can explore diverse model and hardware combinations to achieve optimal performance.

Sliding Window Decoding

Sliding window decoders offer a mechanism to process circuit-level noise across multiple syndrome extraction rounds. By processing syndromes before the complete measurement sequence is received, these decoders can reduce overall latency. However, this approach may introduce a trade-off with increased logical error rates.

Deciding on the optimal application of sliding window decoders depends on factors such as the noise model, error-correcting code parameters, and the latency budget of the quantum processor. With its introduction in version 0.5.0, CUDA-Q QEC allows users to perform experiments using any other CUDA-Q decoder as the 'inner' decoder. The window size can also be easily adjusted through simple parameter changes.

import cudaq
import cudaq_qec as qec
import numpy as np

cudaq.set_target('stim')
num_rounds = 5
code = qec.get_code('surface_code', distance=num_rounds)
noise = cudaq.NoiseModel()
noise.add_all_qubit_channel("x", cudaq.Depolarization2(0.001), 1)
statePrep = qec.operation.prep0
dem = qec.z_dem_from_memory_circuit(code, statePrep, num_rounds, noise)
inner_decoder_params = {'use_osd': True, 'max_iterations': 50, 'use_sparsity': True}
opts = {
    'error_rate_vec': np.array(dem.error_rates),
    'window_size': 1,
    'num_syndromes_per_round': dem.detector_error_matrix.shape[0] // num_rounds,
    'inner_decoder_name': 'nv-qldpc-decoder',
    'inner_decoder_params': inner_decoder_params,
}
swdec = qec.get_decoder('sliding_window', dem.detector_error_matrix, **opts)

It is important to note that each syndrome extraction round must produce a constant number of measurements. The decoder operates without making assumptions about temporal correlations or periodicity in the underlying noise, providing maximum flexibility for investigating round-specific noise variations.

Getting Started with CUDA-Q QEC

CUDA-Q QEC 0.5.0 provides a powerful suite of tools for quantum error correction researchers and QPU operators, accelerating the path towards operationalizing fault-tolerant quantum computers.

To begin using CUDA-Q QEC, simply install it via pip:

pip install cudaq-qec

For comprehensive guidance and further details, refer to the official CUDA-Q QEC documentation.