The following are some examples on how to use the qad package to train and test quantum models, and reproduce results from the paper.

Unsupervised quantum kernel machine

The dataset used for training and testing all the quantum machine learning models is published in zenodo. The training and testing of the unsupervised kernel machine is accomplished using the train.py and test.py in scripts/kernel_machines/, respectively (see in the repo). The configuration parameters of the model, e.g., quantum or classical version, feature map, number of training samples, backend used for the quantum computation, etc, are defined through the arguments of the train.py and test.py scripts. For instance, to train the model:

python train.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --unsup --nqubits 8 --feature_map u_dense_encoding --run_type ideal --output_folder quantum_test --nu_param 0.01 --ntrain 600 --quantum

To test the saved model:

python test.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --model trained_qsvms/quantum_test_nu\=0.01_ideal/

For a small scale demo that can be run on a normal personal computer, in a reasonable amount of time (5-10 minutes), consider using ntrain at the order of 50 to 200 data points for the train.py script, and ntest at around 1000 to 10000 data points for the test.py script.

For details regarding different arguments of the train.py and test.py scripts see below.

Train

main(args: dict)[source]

Trains and saves qsvm model. The following parameters are given through argparse and passed in a dictionary format (args) to the function.

Parameters:
  • args (dict) – Configuration dictionary, containing the following arguments.

  • sig_path (str) – Path to the signal/anomaly dataset (.h5 format).

  • bkg_path (str) – Path to the QCD background dataset (.h5 format).

  • test_bkg_path (str) – Path to the background testing dataset (.h5 format).

  • unsup (bool) – Flag to choose between unsupervised and supervised models.

  • nqubits (int) – Number of qubits for quantum feature map circuit.

  • feature_map (str) – Feature map circuit for the QSVM or classical kernel name.

  • backend_name (str) – Name of the IBMQ quantum computer if running on hardware or noisy simulation.

  • run_type (str) – Choose way to run the QSVM: Ideal computation, noisy simulation or on real quantum hardware. choices=[“ideal”, “noisy”, “hardware”].

  • output_folder (str) – The name of the model to be saved.

  • c_param (float) – The C parameter of the SVM.

  • nu_param (float) – The nu parameter of the unsupervised kernel machine.

  • gamma (float) – Gamma parameter of the classical SVM with RBF kernel.

  • ntrain (int) – The number of training events.

  • ntest (int) – The number of the testing events required for a crosscheck.

Test

main(args: dict)[source]

Asesses the performance of the trained models using k-fold testing. The test dataset is comprised of background (QCD) data and unseen during training, anomalous (new-physics) data.

If the chosen model is tested on hardware, then only 1 fold is computed.

Parameters:
  • args (dict) – Configuration dictionary, with the following arguments.

  • sig_path (str) – Path to the signal/anomaly dataset (.h5 format).

  • bkg_path (str) – Path to the QCD background dataset (.h5 format).

  • test_bkg_path (str) – Path to the background testing dataset (.h5 format).

  • model (str) – The folder path of the QSVM model.

  • ntest (int) – The number of the testing events required for a crosscheck.

  • kfolds (int) – Number of k-validation/test folds used.

  • mod_quantum_instance (bool) – Reconfigure the quantum ” “instance and backend.

Producing figures

After the unsuperised quantum and classical kernel machines have been trained and test scores have been saved, one can summarise their performance with a ROC curve plot. Firstly, following our convention the test scores are prepared for plotting using scripts/kernel_machines/scripts/prepare_plot_scores.py (code), and by running

python prepare_plot_scores.py --classical_folder trained_qsvms/c_test_nu\=0.01/ --quantum_folder trained_qsvms/q_test_nu\=0.01_ideal/ --out_path test_plot --name_suffix n<n_test>_k<k_folds>

Then, we load the score values from the saved files using our convention, e.g. for the case of three different signals, with eight latent dimensions, 600 training datapoints, 100k testing datapoints, and k=5 folds

read_dir='/path/to/data'
n_folds = 5
latent_dim = '8'
n_samples_train=600
mass=['35', '35', '15']
br_na=['NA', '', 'BR'] # narrow (NA) or broad (BR)
signal_name=['RSGraviton_WW', 'AtoHZ_to_ZZZ', 'RSGraviton_WW']
ntest = ['100', '100', '100']

q_loss_qcd=[]; q_loss_sig=[]; c_loss_qcd=[]; c_loss_sig=[]
for i in range(len(signal_name)):
    #if br_na[i]:
    with h5py.File(f'{read_dir}/Latent_{latent_dim}_trainsize_{n_samples_train}_{signal_name[i]}'
                   '{mass[i]}{br_na[i]}_n{ntest[i]}k_kfold{n_folds}.h5', 'r') as file:
        q_loss_qcd.append(file['quantum_loss_qcd'][:])
        q_loss_sig.append(file['quantum_loss_sig'][:])
        c_loss_qcd.append(file['classic_loss_qcd'][:])
        c_loss_sig.append(file['classic_loss_sig'][:])

The final ROC plot, as it appears in the paper in Fig. 3, can be obtained

colors = ['forestgreen', '#EC4E20', 'darkorchid']
legend_signal_names=['Narrow 'r'G $\to$ WW 3.5 TeV', r'A $\to$ HZ $\to$ ZZZ 3.5 TeV', 'Broad 'r'G $\to$ WW 1.5 TeV']
pl.plot_ROC_kfold_mean(q_loss_qcd, q_loss_sig, c_loss_qcd, c_loss_sig, legend_signal_names, n_folds,\
                legend_title=r'Anomaly signature', save_dir='../jupyter_plots', pic_id='test',
                palette=colors, xlabel=r'$TPR$', ylabel=r'$FPR^{-1}$')

Example for the unsupervised kernel machine performance on different anomalies:

image

get_roc_data(qcd: ndarray, bsm: ndarray, fix_tpr: bool = False) Tuple[ndarray][source]

Compute roc curves given the background and anomaly datasets.

Parameters:
  • qcd (np.ndarray) – Background QCD dataset.

  • bsm (np.ndarray) – Anomaly, Beyond the Standard Model (BSM) dataset.

  • fix_tpr (bool, optional) – Constant threshold selection for ROC curve calculation, by default False

Returns:

np.ndarray False Positive Rate array. np.ndarray True Positive Rate array.

Return type:

Tuple

get_FPR_for_fixed_TPR(tpr_window: float, fpr_loss: ndarray, tpr_loss: ndarray, tolerance: float) float[source]

Get FPR for a fixed value of TPR.

Calculation of the ROC curve is in discrete steps. A window of tolerance is defined around the desired TPR working point and the mean of FPR is taken there.

Parameters:
  • tpr_window (float) – TPR working point, typically 0.6 or 0.8

  • fpr_loss (np.ndarray) – FPR array of the ROC curve.

  • tpr_loss (np.ndarray) – TPR array of the ROC curve.

  • tolerance (float) – Tolerance around working point. 0.1-1% window.

Returns:

Mean FPR at the tolerance window around the TPR working point

Return type:

float

get_mean_and_error(data: ndarray) Tuple[float][source]

Compute the mean and std of an array.

Parameters:

data (np.ndarray) – The input array.

Returns:

float:

The mean.

float:

The standard deviation.

Return type:

Tuple

plot_ROC_kfold_mean(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, pic_id: str | None = None, xlabel: str = 'TPR', ylabel: str = '1/FPR', legend_title: str = '$ROC$', save_dir: str | None = None, palette: List[str] = ['#3E96A1', '#EC4E20', '#FF9505'])[source]

Calculate the mean ROC curve and its std uncertainty band.

Using the scores of the the classical and quantum models, the ROC curves are computed for each on of the k-folds. The mean and std is computed, and the ROC mean ROC curve is plotted with its error band. The AUC mean and std is also calculated and presented in the legend of the figure.

Parameters:
  • quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.

  • quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.

  • classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.

  • classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.

  • ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.

  • n_folds (int) – Number of k-folds.

  • pic_id (str, optional) – Name of the output figure, by default None

  • xlabel (str, optional) – Label for the x-axis of the figure, by default “TPR”

  • ylabel (str, optional) – Label for the y-axis of the figure, by default r”1/FPR”

  • legend_title (str, optional) – Title of the main legend, by default “$”

  • save_dir (str, optional) – Output directory for the produced figure, by default None

  • palette (List[str], optional) – Colors for the 3 ROC curves per plot based on the ids, by default [“#3E96A1”, “#EC4E20”, “#FF9505”]

create_table_for_fixed_TPR(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, tpr_windows: List[float] = [0.4, 0.6, 0.8], tolerance: float = 0.01) DataFrame[source]

Compute mean and std of FPR @FPR working point.

Parameters:
  • quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.

  • quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.

  • classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.

  • classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.

  • ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.

  • n_folds (int) – Number of k-folds.

  • tpr_windows (List[float]) – TPR working point, by default [0.4, 0.6, 0.8]

  • tolerance (float) – Tolerance around working point, by default 0.01

Returns:

Latex table of the results.

Return type:

pd.DataFrame

Expressibility and entanglement capability analysis

_images/appendix_plots.png

The metrics are calculated via sampling the circuit parameters from three different distributions as depicted in the legends: the uniform distribution in [0,2π], the QCD background data distribution, and the signal (anomaly) scalar boson data distribution. (a) The expressibility (Expr) as a function of the different circuit architectures. (b) The entanglement capability of the data encoding circuit (\(\langle \mathrm{Q} \rangle\)) as a function of the different circuit architectures. (c) The expressibility of the data encoding circuit as a function of the number of qubits \((\mathrm{n_q})\). (d) The variance of the kernel \(\mathrm{Var}_{z, z'}k(z,z')\) as a function of the number of qubits, where \(k(z,z')\) is the kernel corresponding to the data encoding circuit , z and z’ are data feature vectors sampled from the signal or background distributions.

Given a data encoding quantum circuit we can compute its expressibility and entanglement capability. Additionaly, we can also compute, as function of the number of qubits, the variance of the quantum kernel that is constructed from the given quantum circuit. The different properties of the quantum feature map and the corresponding quantum kernel can be computed using the script compute_expr_ent.py. The desired computation can be chosen using the argparse argument compute.

For instance, to compute the expressibility and entanglement capability of the circuits discussed in the paper run:

python compute_expr_ent.py --n_shots 10000 --n_exp 20 --out_path test --compute expr_ent_vs_circ

where n_shots defines the number of fidelity samples to generate per expressibility and entanglement capability evaluation, n_exp is the number of evaluations (‘experiments’) of the expressibility and entanglement capability needed too estimate the mean and std of around the true value. For more details please check the repo of the triple_e package.

To compute the expressibility as a function of the number of the qubits in a data dependent setting (i.e. sampling the circuit parameters from a data distribution instead of the uniform in [0,2π]) run:

python compute_expr_ent.py --n_qubits 8 --n_shots 100000 --n_exp 20 --out_path test --compute expr_vs_qubits --data_path dataset1_path dataset2_path dataset3_path --data_dependent
main(args: dict)[source]

Computes the different metrics of the given encoding quantum circuit based on the argparse options below.

Parameters:
  • args (dict) – Configuration dictionary with the following parameters.

  • n_qubits (int) – Number of qubits for the feature map circuit.

  • n_shots (int) – How many fidelity samples to generate per expressibility and entanglement capability evaluation.

  • n_exp (int) – Number of evaluations (‘experiments’) of the expressibility and entanglement capability. To estimate the mean and std of around the true value.

  • out_path (str) – Output dataframe to be used for plotting.

  • data_path (str) – Path to signal dataset (background or signal .h5 file) to be used in expr. calculation. Multiple datasets can also be given for the expr_vs_qubit data-dependent computation.

  • compute (str) – Run different calculations: compute expressibility and entanglement capability of different circuits, compute expressibility as a function of the ” number of qubits, and variance of the kernel as a function of qubits. choices=[“expr_ent_vs_circ”, “expr_vs_nqubits”, “var_kernel_vs_nqubits”]

  • data_dependent (bool) – Compute the expressibility as a data-dependent quantity.

Raises:

TypeError – If the given computation type is not one from: [“expr_ent_vs_circ”, “expr_vs_nqubits”, “var_kernel_vs_nqubits”].

prepare_circs(args: dict) Tuple[List, List][source]

Prepares the list of circuit needed for evaluation along with their names. Following the convention of the paper:

circuit_names = [“NE_0”, “NE_1”, “L=1”, “L=2”, “L=3”, “L=4”, “L=5”, “L=6”, “FE”]

Parameters:

args (dict) – Argument dictionary, here used for the number of qubits.

Returns:

circuit_list_expr_ent: List

Circuit lambda-function callables.

circuit_labels: List

Corresponding labels as defined in the paper.

Return type:

Tuple

compute_expr_ent_vs_circuit(args: dict, circuits: List[callable], circuit_labels: List[str], data: ndarray | None = None) DataFrame[source]

Computes the expressibility and entanglement capability of a list of circuits, in the conventional (uniformly sampled parameters from [0, 2pi]) and data-dependent manner.

Parameters:
  • args (dict) – Argparse configuration arguments

  • circuits (List[Callable]) – List of circuits to compute.

  • circuit_labels (List[str]) – Corresponding list of circuit names.

  • data (numpy.ndarray, optional) – Data distribution. If None the circuit parameters are sampled from the uniform distribution, by default None.

Returns:

Pandas Dataframe containing the circuit name and its computed expressibility and entanglement capability, along with their uncertainty.

Return type:

pandas.DataFrame

expr_vs_nqubits(args: dict, rep: int = 3, n_exp: str = 20, data: ndarray | List[ndarray] | None = None) DataFrame[source]

Computes the (data-dependent) expressibility of a data embedding circuit as a function of the qubit number. Saves the output in a dataframe (.h5) with the computed mean values and uncertainties.

Parameters:
  • args (dict) – Argparse configuration arguments

  • rep (int, optional) – Number of repetitions of the data encoding circuit, by default 3

  • n_exp (str, optional) – Number of repetitions of the computation (experiments) to assess the uncertainty of the stochastically calculated metrics, by default 20

  • data (Union[numpy.ndarray, List[numpy.ndarray]], optional) – List of the datasets available for n_qubits = 4, 8, 16 or None for computation with uniformly sampled circuit parameters [0, 2pi]., by default None

Returns:

Dataframe containing the computed expressibility as a function of n_qubits.

Return type:

pandas.DataFrame

var_kernel_vs_nqubits()[source]

Computes the variance of the quantum kernel matrix as a function of the number of qubits.

Parameters:
  • args (dict) – Argparse configuration arguments.

  • data (numpy.ndarray) – Dataset from which to sample.

  • rep (int, optional) – Number of repetitions of the data encoding circuit, by default 3.

Returns:

Variance of the kernel and its correpsonding qubit number.

Return type:

pandas.DataFrame

get_data(data_path: str | List[str], mult_qubits: bool = False) Tuple[ndarray][source]

Loads the data, signal or background, given a path and returns the scaled to (0, 2pi) numpy arrays.

Parameters:
  • data_path (Union[str, List[str]]) – Path to the .h5 dataset, or list of paths for multiple dataset loading.

  • mult_qubits (bool, optional) – If True the specified dataset (in data_path) is loaded for different qubit numbers, i.e., latent dimensions (4, 8, 16) for the kernel machine training/testing., by default False

Returns:

The loaded dataset or list of the numpy datasets.

Return type:

Tuple[numpy.ndarray]

u_dense_encoding_no_ent(x: ndarray, nqubits: int = 8, reps: int = 3, type: int = 0) Statevector[source]

Constructs a feature map based on u_dense_encoding but removing entanglement. The ‘type’ argument corresponds the two No-Entanglement (NE) circuits in the paper.

Parameters:
  • x (numpy.ndarray) – Values of the circuit parameters.

  • nqubits (int, optional) – Number of qubits, by default 8

  • reps (int, optional) – Repetition of the data encoding circuit, by default 3

  • type (int, optional) – Flag to differenciate which type of “Non-entanglement” circuit is used, by default 0

Returns:

State vector qiskit object corresponding to the state generated by the circuit.

Return type:

qiskit.quantum_info.Statevector

u_dense_encoding(x: ndarray, nqubits=8, reps=1) Statevector[source]

Designed feature map circuit for the paper.

Parameters:
  • x (numpy.ndarray) – Values of the circuit parameters.

  • nqubits (int, optional) – Number of qubits for the circuit, by default 8

  • reps (int, optional) – Repetitions of the data encoding ansatz, by default 1

Returns:

State vector qiskit object corresponding to the state generated by the circuit.

Return type:

qiskit.quantum_info.Statevector

u_dense_encoding_all(x: ndarray, nqubits: int = 8, reps: int = 3) Statevector[source]

Data encoding circuit with all-to-all entanglement gates.

Parameters:
  • x (numpy.ndarray) – Values of the circuit parameters.

  • nqubits (int, optional) – Number of qubits for the circuit, by default 8

  • reps (int, optional) – Repetitions of the data encoding circuit, by default 3

Returns:

State vector qiskit object corresponding to the state generated by the circuit.

Return type:

qiskit.quantum_info.Statevector

get_arguments() dict[source]

Parses command line arguments and gives back a dictionary.

Returns:

Dictionary with the argparse arguments.

Return type:

dict