The following are some examples on how to use the qad package to
train and test quantum models, and reproduce results from the paper.
Unsupervised quantum kernel machine
The dataset used for training and testing all the quantum machine learning models is
published in zenodo.
The training and testing of the unsupervised kernel machine is accomplished using the
train.py and test.py in scripts/kernel_machines/,
respectively (see in the repo). The configuration parameters of the model, e.g., quantum
or classical version, feature map, number of training samples, backend
used for the quantum computation, etc, are defined through the arguments
of the train.py and test.py scripts. For instance, to train the
model:
python train.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --unsup --nqubits 8 --feature_map u_dense_encoding --run_type ideal --output_folder quantum_test --nu_param 0.01 --ntrain 600 --quantum
To test the saved model:
python test.py --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --model trained_qsvms/quantum_test_nu\=0.01_ideal/
For a small scale demo that can be run on a normal personal computer, in a reasonable amount of time (5-10 minutes), consider using ntrain at the order of 50 to 200 data points for the train.py script, and ntest at around 1000 to 10000 data points for the test.py script.
For details regarding different arguments of the train.py and test.py scripts
see below.
Train
- main(args: dict)[source]
Trains and saves qsvm model. The following parameters are given through argparse and passed in a dictionary format (args) to the function.
- Parameters:
args (dict) – Configuration dictionary, containing the following arguments.
sig_path (str) – Path to the signal/anomaly dataset (.h5 format).
bkg_path (str) – Path to the QCD background dataset (.h5 format).
test_bkg_path (str) – Path to the background testing dataset (.h5 format).
unsup (bool) – Flag to choose between unsupervised and supervised models.
nqubits (int) – Number of qubits for quantum feature map circuit.
feature_map (str) – Feature map circuit for the QSVM or classical kernel name.
backend_name (str) – Name of the IBMQ quantum computer if running on hardware or noisy simulation.
run_type (str) – Choose way to run the QSVM: Ideal computation, noisy simulation or on real quantum hardware. choices=[“ideal”, “noisy”, “hardware”].
output_folder (str) – The name of the model to be saved.
c_param (float) – The C parameter of the SVM.
nu_param (float) – The nu parameter of the unsupervised kernel machine.
gamma (float) – Gamma parameter of the classical SVM with RBF kernel.
ntrain (int) – The number of training events.
ntest (int) – The number of the testing events required for a crosscheck.
Test
- main(args: dict)[source]
Asesses the performance of the trained models using k-fold testing. The test dataset is comprised of background (QCD) data and unseen during training, anomalous (new-physics) data.
If the chosen model is tested on hardware, then only 1 fold is computed.
- Parameters:
args (dict) – Configuration dictionary, with the following arguments.
sig_path (str) – Path to the signal/anomaly dataset (.h5 format).
bkg_path (str) – Path to the QCD background dataset (.h5 format).
test_bkg_path (str) – Path to the background testing dataset (.h5 format).
model (str) – The folder path of the QSVM model.
ntest (int) – The number of the testing events required for a crosscheck.
kfolds (int) – Number of k-validation/test folds used.
mod_quantum_instance (bool) – Reconfigure the quantum ” “instance and backend.
Producing figures
After the unsuperised quantum and classical kernel machines have been
trained and test scores have been saved, one can summarise their
performance with a ROC curve plot. Firstly, following our convention the
test scores are prepared for plotting using
scripts/kernel_machines/scripts/prepare_plot_scores.py (code),
and by running
python prepare_plot_scores.py --classical_folder trained_qsvms/c_test_nu\=0.01/ --quantum_folder trained_qsvms/q_test_nu\=0.01_ideal/ --out_path test_plot --name_suffix n<n_test>_k<k_folds>
Then, we load the score values from the saved files using our convention, e.g. for the case of three different signals, with eight latent dimensions, 600 training datapoints, 100k testing datapoints, and k=5 folds
read_dir='/path/to/data'
n_folds = 5
latent_dim = '8'
n_samples_train=600
mass=['35', '35', '15']
br_na=['NA', '', 'BR'] # narrow (NA) or broad (BR)
signal_name=['RSGraviton_WW', 'AtoHZ_to_ZZZ', 'RSGraviton_WW']
ntest = ['100', '100', '100']
q_loss_qcd=[]; q_loss_sig=[]; c_loss_qcd=[]; c_loss_sig=[]
for i in range(len(signal_name)):
#if br_na[i]:
with h5py.File(f'{read_dir}/Latent_{latent_dim}_trainsize_{n_samples_train}_{signal_name[i]}'
'{mass[i]}{br_na[i]}_n{ntest[i]}k_kfold{n_folds}.h5', 'r') as file:
q_loss_qcd.append(file['quantum_loss_qcd'][:])
q_loss_sig.append(file['quantum_loss_sig'][:])
c_loss_qcd.append(file['classic_loss_qcd'][:])
c_loss_sig.append(file['classic_loss_sig'][:])
The final ROC plot, as it appears in the paper in Fig. 3, can be obtained
colors = ['forestgreen', '#EC4E20', 'darkorchid']
legend_signal_names=['Narrow 'r'G $\to$ WW 3.5 TeV', r'A $\to$ HZ $\to$ ZZZ 3.5 TeV', 'Broad 'r'G $\to$ WW 1.5 TeV']
pl.plot_ROC_kfold_mean(q_loss_qcd, q_loss_sig, c_loss_qcd, c_loss_sig, legend_signal_names, n_folds,\
legend_title=r'Anomaly signature', save_dir='../jupyter_plots', pic_id='test',
palette=colors, xlabel=r'$TPR$', ylabel=r'$FPR^{-1}$')
Example for the unsupervised kernel machine performance on different anomalies:
- get_roc_data(qcd: ndarray, bsm: ndarray, fix_tpr: bool = False) Tuple[ndarray][source]
Compute roc curves given the background and anomaly datasets.
- Parameters:
qcd (np.ndarray) – Background QCD dataset.
bsm (np.ndarray) – Anomaly, Beyond the Standard Model (BSM) dataset.
fix_tpr (bool, optional) – Constant threshold selection for ROC curve calculation, by default False
- Returns:
np.ndarray False Positive Rate array. np.ndarray True Positive Rate array.
- Return type:
Tuple
- get_FPR_for_fixed_TPR(tpr_window: float, fpr_loss: ndarray, tpr_loss: ndarray, tolerance: float) float[source]
Get FPR for a fixed value of TPR.
Calculation of the ROC curve is in discrete steps. A window of tolerance is defined around the desired TPR working point and the mean of FPR is taken there.
- Parameters:
tpr_window (float) – TPR working point, typically 0.6 or 0.8
fpr_loss (np.ndarray) – FPR array of the ROC curve.
tpr_loss (np.ndarray) – TPR array of the ROC curve.
tolerance (float) – Tolerance around working point. 0.1-1% window.
- Returns:
Mean FPR at the tolerance window around the TPR working point
- Return type:
float
- get_mean_and_error(data: ndarray) Tuple[float][source]
Compute the mean and std of an array.
- Parameters:
data (np.ndarray) – The input array.
- Returns:
- float:
The mean.
- float:
The standard deviation.
- Return type:
Tuple
- plot_ROC_kfold_mean(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, pic_id: str | None = None, xlabel: str = 'TPR', ylabel: str = '1/FPR', legend_title: str = '$ROC$', save_dir: str | None = None, palette: List[str] = ['#3E96A1', '#EC4E20', '#FF9505'])[source]
Calculate the mean ROC curve and its std uncertainty band.
Using the scores of the the classical and quantum models, the ROC curves are computed for each on of the k-folds. The mean and std is computed, and the ROC mean ROC curve is plotted with its error band. The AUC mean and std is also calculated and presented in the legend of the figure.
- Parameters:
quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.
quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.
classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.
classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.
ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.
n_folds (int) – Number of k-folds.
pic_id (str, optional) – Name of the output figure, by default None
xlabel (str, optional) – Label for the x-axis of the figure, by default “TPR”
ylabel (str, optional) – Label for the y-axis of the figure, by default r”1/FPR”
legend_title (str, optional) – Title of the main legend, by default “$”
save_dir (str, optional) – Output directory for the produced figure, by default None
palette (List[str], optional) – Colors for the 3 ROC curves per plot based on the ids, by default [“#3E96A1”, “#EC4E20”, “#FF9505”]
- create_table_for_fixed_TPR(quantum_loss_qcd: List[ndarray], quantum_loss_sig: List[ndarray], classic_loss_qcd: List[ndarray], classic_loss_sig: List[ndarray], ids: List[str], n_folds: int, tpr_windows: List[float] = [0.4, 0.6, 0.8], tolerance: float = 0.01) DataFrame[source]
Compute mean and std of FPR @FPR working point.
- Parameters:
quantum_loss_qcd (List[np.ndarray]) – List of scores of the quantum model on the background (QCD) data.
quantum_loss_sig (List[np.ndarray]) – List of scores of the quantum model on the signal (anomaly) data.
classic_loss_qcd (List[np.ndarray]) – List of scores of the classical model on the background (QCD) data.
classic_loss_sig (List[np.ndarray]) – List of scores of the classical model on the signal (anomaly) data.
ids (List[str]) – Identifier of the different scores corresponing to the lists of scores. Namely, 3 different anomalies, 3 different latent dimensions or 3 different training sizes.
n_folds (int) – Number of k-folds.
tpr_windows (List[float]) – TPR working point, by default [0.4, 0.6, 0.8]
tolerance (float) – Tolerance around working point, by default 0.01
- Returns:
Latex table of the results.
- Return type:
pd.DataFrame
Expressibility and entanglement capability analysis
The metrics are calculated via sampling the circuit parameters from three different distributions as depicted in the legends: the uniform distribution in [0,2π], the QCD background data distribution, and the signal (anomaly) scalar boson data distribution. (a) The expressibility (Expr) as a function of the different circuit architectures. (b) The entanglement capability of the data encoding circuit (\(\langle \mathrm{Q} \rangle\)) as a function of the different circuit architectures. (c) The expressibility of the data encoding circuit as a function of the number of qubits \((\mathrm{n_q})\). (d) The variance of the kernel \(\mathrm{Var}_{z, z'}k(z,z')\) as a function of the number of qubits, where \(k(z,z')\) is the kernel corresponding to the data encoding circuit , z and z’ are data feature vectors sampled from the signal or background distributions.
Given a data encoding quantum circuit we can compute its expressibility and entanglement
capability. Additionaly, we can also compute, as function of the
number of qubits, the variance of the quantum kernel that is
constructed from the given quantum circuit. The different properties of the quantum feature map and the
corresponding quantum kernel can be computed using the script compute_expr_ent.py.
The desired computation can be chosen using the argparse argument compute.
For instance, to compute the expressibility and entanglement capability of the circuits discussed in the paper run:
python compute_expr_ent.py --n_shots 10000 --n_exp 20 --out_path test --compute expr_ent_vs_circ
where n_shots defines the number of fidelity samples to generate per expressibility
and entanglement capability evaluation, n_exp is the number of evaluations (‘experiments’)
of the expressibility and entanglement capability needed too estimate the mean and std
of around the true value. For more details please check the repo
of the triple_e package.
To compute the expressibility as a function of the number of the qubits in a data dependent setting (i.e. sampling the circuit parameters from a data distribution instead of the uniform in [0,2π]) run:
python compute_expr_ent.py --n_qubits 8 --n_shots 100000 --n_exp 20 --out_path test --compute expr_vs_qubits --data_path dataset1_path dataset2_path dataset3_path --data_dependent
- main(args: dict)[source]
Computes the different metrics of the given encoding quantum circuit based on the argparse options below.
- Parameters:
args (dict) – Configuration dictionary with the following parameters.
n_qubits (int) – Number of qubits for the feature map circuit.
n_shots (int) – How many fidelity samples to generate per expressibility and entanglement capability evaluation.
n_exp (int) – Number of evaluations (‘experiments’) of the expressibility and entanglement capability. To estimate the mean and std of around the true value.
out_path (str) – Output dataframe to be used for plotting.
data_path (str) – Path to signal dataset (background or signal .h5 file) to be used in expr. calculation. Multiple datasets can also be given for the expr_vs_qubit data-dependent computation.
compute (str) – Run different calculations: compute expressibility and entanglement capability of different circuits, compute expressibility as a function of the ” number of qubits, and variance of the kernel as a function of qubits. choices=[“expr_ent_vs_circ”, “expr_vs_nqubits”, “var_kernel_vs_nqubits”]
data_dependent (bool) – Compute the expressibility as a data-dependent quantity.
- Raises:
TypeError – If the given computation type is not one from: [“expr_ent_vs_circ”, “expr_vs_nqubits”, “var_kernel_vs_nqubits”].
- prepare_circs(args: dict) Tuple[List, List][source]
Prepares the list of circuit needed for evaluation along with their names. Following the convention of the paper:
circuit_names = [“NE_0”, “NE_1”, “L=1”, “L=2”, “L=3”, “L=4”, “L=5”, “L=6”, “FE”]
- Parameters:
args (dict) – Argument dictionary, here used for the number of qubits.
- Returns:
- circuit_list_expr_ent: List
Circuit lambda-function callables.
- circuit_labels: List
Corresponding labels as defined in the paper.
- Return type:
Tuple
- compute_expr_ent_vs_circuit(args: dict, circuits: List[callable], circuit_labels: List[str], data: ndarray | None = None) DataFrame[source]
Computes the expressibility and entanglement capability of a list of circuits, in the conventional (uniformly sampled parameters from [0, 2pi]) and data-dependent manner.
- Parameters:
args (dict) – Argparse configuration arguments
circuits (List[Callable]) – List of circuits to compute.
circuit_labels (List[str]) – Corresponding list of circuit names.
data (
numpy.ndarray, optional) – Data distribution. If None the circuit parameters are sampled from the uniform distribution, by default None.
- Returns:
Pandas Dataframe containing the circuit name and its computed expressibility and entanglement capability, along with their uncertainty.
- Return type:
pandas.DataFrame
- expr_vs_nqubits(args: dict, rep: int = 3, n_exp: str = 20, data: ndarray | List[ndarray] | None = None) DataFrame[source]
Computes the (data-dependent) expressibility of a data embedding circuit as a function of the qubit number. Saves the output in a dataframe (.h5) with the computed mean values and uncertainties.
- Parameters:
args (dict) – Argparse configuration arguments
rep (int, optional) – Number of repetitions of the data encoding circuit, by default 3
n_exp (str, optional) – Number of repetitions of the computation (experiments) to assess the uncertainty of the stochastically calculated metrics, by default 20
data (Union[
numpy.ndarray, List[numpy.ndarray]], optional) – List of the datasets available for n_qubits = 4, 8, 16 or None for computation with uniformly sampled circuit parameters [0, 2pi]., by default None
- Returns:
Dataframe containing the computed expressibility as a function of n_qubits.
- Return type:
pandas.DataFrame
- var_kernel_vs_nqubits()[source]
Computes the variance of the quantum kernel matrix as a function of the number of qubits.
- Parameters:
args (dict) – Argparse configuration arguments.
data (
numpy.ndarray) – Dataset from which to sample.rep (int, optional) – Number of repetitions of the data encoding circuit, by default 3.
- Returns:
Variance of the kernel and its correpsonding qubit number.
- Return type:
pandas.DataFrame
- get_data(data_path: str | List[str], mult_qubits: bool = False) Tuple[ndarray][source]
Loads the data, signal or background, given a path and returns the scaled to (0, 2pi) numpy arrays.
- Parameters:
data_path (Union[str, List[str]]) – Path to the .h5 dataset, or list of paths for multiple dataset loading.
mult_qubits (bool, optional) – If True the specified dataset (in data_path) is loaded for different qubit numbers, i.e., latent dimensions (4, 8, 16) for the kernel machine training/testing., by default False
- Returns:
The loaded dataset or list of the numpy datasets.
- Return type:
Tuple[
numpy.ndarray]
- u_dense_encoding_no_ent(x: ndarray, nqubits: int = 8, reps: int = 3, type: int = 0) Statevector[source]
Constructs a feature map based on u_dense_encoding but removing entanglement. The ‘type’ argument corresponds the two No-Entanglement (NE) circuits in the paper.
- Parameters:
x (
numpy.ndarray) – Values of the circuit parameters.nqubits (int, optional) – Number of qubits, by default 8
reps (int, optional) – Repetition of the data encoding circuit, by default 3
type (int, optional) – Flag to differenciate which type of “Non-entanglement” circuit is used, by default 0
- Returns:
State vector
qiskitobject corresponding to the state generated by the circuit.- Return type:
qiskit.quantum_info.Statevector
- u_dense_encoding(x: ndarray, nqubits=8, reps=1) Statevector[source]
Designed feature map circuit for the paper.
- Parameters:
x (
numpy.ndarray) – Values of the circuit parameters.nqubits (int, optional) – Number of qubits for the circuit, by default 8
reps (int, optional) – Repetitions of the data encoding ansatz, by default 1
- Returns:
State vector qiskit object corresponding to the state generated by the circuit.
- Return type:
qiskit.quantum_info.Statevector
- u_dense_encoding_all(x: ndarray, nqubits: int = 8, reps: int = 3) Statevector[source]
Data encoding circuit with all-to-all entanglement gates.
- Parameters:
x (
numpy.ndarray) – Values of the circuit parameters.nqubits (int, optional) – Number of qubits for the circuit, by default 8
reps (int, optional) – Repetitions of the data encoding circuit, by default 3
- Returns:
State vector qiskit object corresponding to the state generated by the circuit.
- Return type:
qiskit.quantum_info.Statevector