bead.src.utils package

Submodules

bead.src.utils.conversion module

bead.src.utils.conversion.calculate_jet_properties(constituents)[source]

Calculate jet pT, eta, and phi from constituent properties.

Parameters:

constituents (list of dicts) – Each dict contains constituent properties: {‘pt’: …, ‘eta’: …, ‘phi’: …}

Returns:

Jet properties {‘jet_pt’: …, ‘jet_eta’: …, ‘jet_phi’: …}

Return type:

dict

bead.src.utils.conversion.convert_csv_to_hdf5_npy_parallel(csv_file, output_prefix, out_path, file_type='h5', n_workers=4, verbose: bool = False)[source]

Convert a CSV file to HDF5 and .npy files in parallel, adding event ID (evt_id) and jet-level properties calculated from constituents.

Parameters:
  • csv_file (str) – Path to the input CSV file.

  • output_prefix (str) – Prefix for the output files.

  • file_type (str) – Output file type (‘h5’ or ‘npy’).

  • out_path (str) – Path to save output files.

  • n_workers (int) – Number of parallel workers.

  • verbose (bool) – Print progress if True.

bead.src.utils.conversion.process_event(evt_id, row)[source]

Process a single event, calculating jet and constituent data.

Parameters:
  • evt_id (int) – Event ID.

  • row (pandas.Series) – Row from the DataFrame corresponding to the event.

Returns:

(event_data, jets, constituents) for the event.

Return type:

tuple

bead.src.utils.data_processing module

bead.src.utils.data_processing.decode_pid(encoded_tensor_path, pid_map_path, decoded_tensor_path)[source]

Reads the encoded tensor and pid map, decodes the pids back to their original values, and saves the decoded tensor.

bead.src.utils.data_processing.encode_pid_parallel(constits_tensor_path, encoded_tensor_path, pid_map_path)[source]

Parallelized: Reads the constituent-level tensor and encodes the pid column using categorical encoding. Saves the new tensor and the pid map for later retrieval.

bead.src.utils.data_processing.load_data(file_path, file_type='h5', verbose: bool = False)[source]

Load data from either an HDF5 file or .npy files.

bead.src.utils.data_processing.parallel_select_top_jets_and_constituents(jets, constituents, n_jets=3, n_constits=15, n_workers=4, verbose: bool = False)[source]

Parallelized selection of top jets and constituents.

bead.src.utils.data_processing.preproc_inputs(paths, config, keyword, verbose: bool = False)[source]
bead.src.utils.data_processing.process_and_save_tensors(in_path, out_path, output_prefix, config, verbose: bool = False)[source]

Process the input file, parallelize selections, and save the results as PyTorch tensors.

bead.src.utils.data_processing.process_event(evt_id, evt_jets, evt_constits, n_jets=3, n_constits=15, verbose: bool = False)[source]

Process a single event to select top jets and their top constituents.

bead.src.utils.diagnostics module

bead.src.utils.diagnostics.c_profile(func, *args, **kwargs)[source]

Profile the function func with cProfile.

Parameters:

func (callable) – The function to be profiled.

Returns:

The result of the function func execution.

Return type:

result

bead.src.utils.diagnostics.dict_to_square_matrix(input_dict: dict) array[source]

Function changes an input dictionary into a square np.array. Adds NaNs when the dimension of a dict key is less than of the final square matrix.

Parameters:

input_dict (dict)

Returns:

square_matrix (np.array)

bead.src.utils.diagnostics.get_mean_node_activations(input_dict: dict) dict[source]
bead.src.utils.diagnostics.nap_diagnose(input_path: str, output_path: str) None[source]
bead.src.utils.diagnostics.plot(data: array, output_path: str) None[source]
bead.src.utils.diagnostics.pytorch_profile(f, *args, **kwargs)[source]

This function performs PyTorch profiling of CPU, GPU time and memory consumed by the function f execution.

Parameters:

f (callable) – The function to be profiled.

Returns:

The result of the function f execution.

Return type:

result

bead.src.utils.ggl module

class bead.src.utils.ggl.Config(file_type: str, parallel_workers: int, num_jets: int, num_constits: int, latent_space_size: int, normalizations: str, invert_normalizations: bool, train_size: float, epochs: int, early_stopping: bool, early_stoppin_patience: int, lr_scheduler: bool, lr_scheduler_patience: int, min_delta: int, model_name: str, input_level: str, input_features: str, model_init: str, loss_function: str, reg_param: float, lr: float, batch_size: int, test_size: float, intermittent_model_saving: bool, separate_model_saving: bool, intermittent_saving_patience: int, activation_extraction: bool, deterministic_algorithm: bool)[source]

Bases: object

Defines a configuration dataclass

activation_extraction: bool
batch_size: int
deterministic_algorithm: bool
early_stoppin_patience: int
early_stopping: bool
epochs: int
file_type: str
input_features: str
input_level: str
intermittent_model_saving: bool
intermittent_saving_patience: int
invert_normalizations: bool
latent_space_size: int
loss_function: str
lr: float
lr_scheduler: bool
lr_scheduler_patience: int
min_delta: int
model_init: str
model_name: str
normalizations: str
num_constits: int
num_jets: int
parallel_workers: int
reg_param: float
separate_model_saving: bool
test_size: float
train_size: float
bead.src.utils.ggl.convert_csv(paths, config, verbose: bool = False)[source]

Convert the input ‘’.csv’ into the file_type selected in the config file (‘.h5’ by default)

Separate event-level, jet-level and constituent-level data into separate datasets/files.

Parameters:
  • data_path (path) – Path to the input csv files

  • output_path (path) – Selects base path for determining output path

  • config (dataClass) – Base class selecting user inputs

  • verbose (bool) – If True, prints out more information

Outputs:

A ProjectName_OutputPrefix.h5 file which includes: - Event-level dataset - Jet-level dataset - Constituent-level dataset

or

A ProjectName_OutputPrefix_{data-level}.npy files which contain the same information as above, split into 3 separate files.

bead.src.utils.ggl.create_default_config(workspace_name: str, project_name: str) str[source]

Creates a default config file for a project. :param workspace_name: Name of the workspace. :type workspace_name: str :param project_name: Name of the project. :type project_name: str

Returns:

Default config file.

Return type:

str

bead.src.utils.ggl.create_new_project(workspace_name: str, project_name: str, verbose: bool = False, base_path: str = 'workspaces') None[source]

Creates a new project directory output subdirectories and config files within a workspace.

Parameters:
  • workspace_name (str) – Creates a workspace (dir) for storing data and projects with this name.

  • project_name (str) – Creates a project (dir) for storing configs and outputs with this name.

  • verbose (bool, optional) – Whether to print out the progress. Defaults to False.

bead.src.utils.ggl.get_arguments()[source]

Determines commandline arguments specified by BEAD user. Use –help to see what options are available.

Returns: .py, string, folder: .py file containing the config options, string determining what mode to run, projects directory where outputs go.

bead.src.utils.ggl.loss_plotter(path_to_loss_data, output_path, config)[source]

Calls plotting.loss_plot()

Parameters:
  • path_to_loss_data (string) – Path to the values for the loss plot

  • output_path (string) – Path to output the data

  • config (dataClass) – Base class selecting user inputs

Returns:

Plot containing the loss curves

Return type:

.pdf file

bead.src.utils.ggl.plotter(output_path, config)[source]

Calls plotting.plot()

Parameters:
  • output_path (string) – Path to the output directory

  • config (dataClass) – Base class selecting user inputs

bead.src.utils.ggl.prepare_inputs(paths, config, verbose: bool = False)[source]

Read the input data and generate torch tensors ready to train on.

Select number of leading jets per event and number of leading constituents per jet to be used for training.

Parameters:
  • paths – Dictionary of common paths used in the pipeline

  • config (dataClass) – Base class selecting user inputs

  • verbose (bool) – If True, prints out more information

Outputs:

Tensor files which include: - Event-level dataset - [evt_id, evt_weight, met, met_phi, num_jets] - Jet-level dataset - [evt_id, jet_id, num_constituents, jet_btag, jet_pt, jet_eta, jet_phi] - Constituent-level dataset - [evt_id, jet_id, constituent_id, jet_btag, constituent_pt, constituent_eta, constituent_phi]

bead.src.utils.ggl.run_diagnostics(project_path, verbose: bool)[source]

Calls diagnostics.diagnose()

Parameters:
  • input_path (str) – path to the np.array contataining the activations values

  • output_path (str) – path to store the diagnostics pdf

bead.src.utils.ggl.run_full_chain(workspace_name: str, project_name: str, paths: dict, config: dict, options: str, verbose: bool = False) None[source]

Execute a sequence of operations based on the provided options string.

Parameters:
  • workspace_name – Name of the workspace for new projects

  • project_name – Name of the project for new projects

  • paths – Dictionary of file paths and directories

  • config – Configuration dictionary for operations

  • options – Underscore-separated string specifying the workflow sequence

  • verbose – Whether to show verbose output

Example

run_full_chain(“my_workspace”, “my_project”, paths, config,

“newproject_convertcsv_prepareinputs_train_detect”, verbose=True)

bead.src.utils.ggl.run_inference(paths, config, verbose: bool = False)[source]
Main function calling the training functions, ran when –mode=train is selected.

The three functions called are: process, ggl.mode_init and training.train.

Parameters:
  • paths (dictionary) – Dictionary of common paths used in the pipeline

  • config (dataClass) – Base class selecting user inputs

  • verbose (bool) – If True, prints out more information

bead.src.utils.ggl.run_plots(output_path, config, verbose: bool)[source]
Main function calling the two plotting functions, ran when –mode=plot is selected.

The two main functions this calls are: ggl.plotter and ggl.loss_plotter

Parameters:
  • prepare_inputt_path (string) – Selects base path for determining output path

  • config (dataClass) – Base class selecting user inputs

  • verbose (bool) – If True, prints out more information

bead.src.utils.ggl.run_training(paths, config, verbose: bool = False)[source]
Main function calling the training functions, ran when –mode=train is selected.

The three functions called are: ‘data_processing.preproc_inputs’ and training.train.

Parameters:
  • paths (dictionary) – Dictionary of common paths used in the pipeline

  • config (dataClass) – Base class selecting user inputs

  • verbose (bool) – If True, prints out more information

bead.src.utils.helper module

class bead.src.utils.helper.ChainedScaler(scalers)[source]

Bases: BaseEstimator, TransformerMixin

Chains a list of scaler transformations. The transformation is applied sequentially (in the order provided) and the inverse transformation is applied in reverse order.

fit(X, y=None)[source]
inverse_transform(X)[source]
transform(X)[source]
class bead.src.utils.helper.CustomDataset(data_tensor, label_tensor)[source]

Bases: Dataset

class bead.src.utils.helper.EarlyStopping(patience: int, min_delta: float)[source]

Bases: object

Class to perform early stopping during model training. .. attribute:: patience

The number of epochs to wait before stopping the training process if the validation loss doesn’t improve.

type:

int

min_delta

The minimum difference between the new loss and the previous best loss for the new loss to be considered an improvement.

Type:

float

counter

Counts the number of times the validation loss hasn’t improved.

Type:

int

best_loss

The best validation loss observed so far.

Type:

float

early_stop

Flag that indicates whether early stopping criteria have been met.

Type:

bool

class bead.src.utils.helper.L2Normalizer[source]

Bases: BaseEstimator, TransformerMixin

L2 normalization per feature of data

fit(X, y=None)[source]
inverse_transform(X)[source]
transform(X)[source]
class bead.src.utils.helper.LRScheduler(optimizer, patience, min_lr=1e-06, factor=0.5)[source]

Bases: object

A learning rate scheduler that adjusts the learning rate of an optimizer based on the training loss.

Parameters:
  • optimizer (torch.optim.Optimizer) – The optimizer whose learning rate will be adjusted.

  • patience (int) – The number of epochs with no improvement in training loss after which the learning rate will be reduced.

  • min_lr (float, optional) – The minimum learning rate that can be reached (default: 1e-6).

  • factor (float, optional) – The factor by which the learning rate will be reduced (default: 0.1).

lr_scheduler

The PyTorch learning rate scheduler that actually performs the adjustments.

Type:

torch.optim.lr_scheduler.ReduceLROnPlateau

Example usage:

optimizer = torch.optim.Adam(model.parameters(), lr=0.01) lr_scheduler = LRScheduler(optimizer, patience=3, min_lr=1e-6, factor=0.5) for epoch in range(num_epochs): train_loss = train(model, train_data_loader) lr_scheduler(train_loss) # …

class bead.src.utils.helper.Log1pScaler[source]

Bases: BaseEstimator, TransformerMixin

Log(1+x) transformer for positive-skewed HEP features

fit(X, y=None)[source]
inverse_transform(X)[source]
transform(X)[source]
class bead.src.utils.helper.SinCosTransformer[source]

Bases: BaseEstimator, TransformerMixin

Transforms an angle (in radians) into two features: [sin(angle), cos(angle)]. Inverse transformation uses arctan2.

fit(X, y=None)[source]
inverse_transform(X)[source]
transform(X)[source]
bead.src.utils.helper.add_sig_bkg_label(tensors: tuple, label: str) tuple[source]

Adds a new feature to the last dimension of each tensor in the tuple. The new feature is filled with 0 for “bkg” and 1 for “sig”.

Parameters:
  • tensors – A tuple of three tensors (events, jets, constituents).

  • label – A string, either “bkg” or “sig”, to determine the value of the new feature.

Returns:

A tuple of the three tensors with the new feature added to the last dimension.

bead.src.utils.helper.calculate_in_shape(data, config)[source]

Calculates the input shapes for the models based on the data.

Parameters:
  • data (ndarray) – The data you wish to calculate the input shapes for.

  • config (dataClass) – Base class selecting user inputs.

Returns:

A tuple containing the input shapes for the models.

Return type:

tuple

bead.src.utils.helper.call_forward(model, inputs)[source]

Calls the forward method of the given object. If the return value is not a tuple, packs it into a tuple.

Parameters:
  • model – An object that has a forward method.

  • inputs – The input data to pass to the model.

Returns:

A tuple containing the result(s) of the forward method.

bead.src.utils.helper.convert_to_tensor(data)[source]

Converts ndarray to torch.Tensors.

Parameters:

data (ndarray) – The data you wish to convert from ndarray to torch.Tensor.

Returns:

Your data as a tensor

Return type:

torch.Tensor

bead.src.utils.helper.create_datasets(events_train, jets_train, constituents_train, events_val, jets_val, constituents_val, events_train_label, jets_train_label, constituents_train_label, events_val_label, jets_val_label, constituents_val_label)[source]
bead.src.utils.helper.data_label_split(data)[source]

Splits the data into features and labels.

Parameters:

data (ndarray) – The data you wish to split into features and labels.

Returns:

A tuple containing two ndarrays:
  • data: The features of the data.

  • labels: The labels of the data.

Return type:

tuple

bead.src.utils.helper.decoder_saver(model, model_path: str) None[source]

Saves the Decoder state dictionary as a .pt file to the given path

Parameters:
  • model (nn.Module) – The PyTorch model to save.

  • model_path (str) – String defining the models save path.

Returns:

Saved decoder state dictionary as .pt file.

Return type:

None

bead.src.utils.helper.detach_device(tensor)[source]

Detaches a given tensor to ndarray

Parameters:

tensor (torch.Tensor) – The PyTorch tensor one wants to convert to a ndarray

Returns:

Converted torch.Tensor to ndarray

Return type:

ndarray

bead.src.utils.helper.encoder_saver(model, model_path: str) None[source]

Saves the Encoder state dictionary as a .pt file to the given path

Parameters:
  • model (nn.Module) – The PyTorch model to save.

  • model_path (str) – String defining the models save path.

Returns:

Saved encoder state dictionary as .pt file.

Return type:

None

bead.src.utils.helper.get_device()[source]
Returns the appropriate processing device. IF cuda is available it returns “cuda:0”

Otherwise it returns “cpu”

Returns:

Device string, either “cpu” or “cuda:0”

Return type:

_type_

bead.src.utils.helper.get_loss(loss_function: str)[source]

Returns the loss_object based on the string provided.

Parameters:

loss_function (str) – The loss function you wish to use. Options include: - ‘mse’: Mean Squared Error - ‘bce’: Binary Cross Entropy - ‘mae’: Mean Absolute Error - ‘huber’: Huber Loss - ‘l1’: L1 Loss - ‘l2’: L2 Loss - ‘smoothl1’: Smooth L1 Loss

Returns:

The loss function object

Return type:

class

bead.src.utils.helper.get_optimizer(optimizer_name, parameters, lr)[source]

Returns a PyTorch optimizer configured with optimal arguments for training a large VAE.

Parameters:
  • optimizer_name (str) – One of “adam”, “adamw”, “rmsprop”, “sgd”, “radam”, “adagrad”.

  • parameters (iterable) – The parameters (or parameter groups) of your model.

  • lr (float) – The learning rate for the optimizer.

Returns:

An instantiated optimizer with specified hyperparameters.

Return type:

torch.optim.Optimizer

Raises:

ValueError – If an unsupported optimizer name is provided.

bead.src.utils.helper.invert_normalize_data(normalized_data, scaler)[source]

Inverts a chained normalization transformation.

This function accepts normalized data (for example, the output of a VAE’s preprocessed input) and the scaler (or ChainedScaler) that was used to perform the forward transformation. It then returns the original data by calling the scaler’s inverse_transform method.

Parameters:
  • normalized_data (np.ndarray) – The transformed data array.

  • scaler – The scaler object (or a ChainedScaler instance) used for the forward transformation, which must implement an inverse_transform method.

Returns:

The data mapped back to its original scale.

Return type:

np.ndarray

bead.src.utils.helper.load_augment_tensors(folder_path, keyword)[source]

Searches through the specified folder for all ‘.pt’ files whose names contain the specified keyword (e.g., ‘bkg_train’, ‘bkg_test’, or ‘sig_test’). Files are then categorized by whether their filename contains one of the three substrings: ‘jets’, ‘events’, or ‘constituents’.

For ‘bkg_train’, each file must contain one of the generator names: ‘herwig’, ‘pythia’, or ‘sherpa’. For each file, the tensor is loaded and a new feature is appended along the last dimension: - 0 for files containing ‘herwig’ - 1 for files containing ‘pythia’ - 2 for files containing ‘sherpa’

For ‘bkg_test’ and ‘sig_test’, the appended new feature is filled with -1, since generator info is not available at test time.

Finally, for each category the resulting tensors are concatenated along axis=0.

Parameters:
  • folder_path (str) – The path to the folder to search.

  • keyword (str) – The keyword to filter files (e.g., ‘bkg_train’, ‘bkg_test’, or ‘sig_test’).

Returns:

A tuple of three PyTorch tensors: (jets_tensor, events_tensor, constituents_tensor)

corresponding to the concatenated tensors for each category.

Return type:

tuple

Raises:

ValueError – If any category does not have at least one file for each generator type. The error message is: “required files not found. please run the –mode convert_csv and prepare inputs before retrying”

bead.src.utils.helper.load_model(model_path: str, in_shape, config)[source]

Loads the state dictionary of the trained model into a model variable. This variable is then used for passing data through the encoding and decoding functions.

Parameters:
  • model_object (object) – Object with the models attributes

  • model_path (str) – Path to model

  • n_features (int) – Input dimension size

  • z_dim (int) – Latent space size

Returns: nn.Module: Returns a model object with the attributes of the model class, with the selected state dictionary loaded into it.

bead.src.utils.helper.load_tensors(folder_path, keyword='sig_test')[source]

Searches through the specified folder for all ‘.pt’ files containing the given keyword in their names. Categorizes these files based on the presence of ‘jets’, ‘events’, or ‘constituents’ in their filenames, loads them into PyTorch tensors, concatenates them along axis=0, and returns the resulting tensors.

Parameters:
  • folder_path (str) – The path to the folder to search.

  • keyword (str) – The keyword to filter files (‘bkg_train’, ‘bkg_test’, or ‘sig_test’).

Returns:

A tuple containing three PyTorch tensors: (jets_tensor, events_tensor, constituents_tensor).

Return type:

tuple

Raises:

ValueError – If any specific category (‘jets’, ‘events’, ‘constituents’) has no matching files. The error message is: “Required files not found. Please run the –mode convert_csv and prepare inputs before retrying.”

bead.src.utils.helper.model_init(in_shape, config)[source]

Initializing the models attributes to a model_object variable.

Parameters:
  • model_name (str) – The name of the model you wish to initialize. This should correspond to what your Model name.

  • init (str) – The initialization method you wish to use (Xavier support currently). Default is None.

  • config (dataClass) – Base class selecting user inputs.

Returns:

Object with the models class attributes

Return type:

class

bead.src.utils.helper.normalize_data(data, normalization_type)[source]

Normalizes jet data for VAE-based anomaly detection.

Parameters:
  • data – 2D numpy array (n_jets, n_features)

  • normalization_type – A string indicating the normalization method(s). It can be a single method or a chain of methods separated by ‘+’. Valid options include: ‘minmax’ - MinMaxScaler (scales features to [0,1]) ‘standard’- StandardScaler (zero mean, unit variance) ‘robust’ - RobustScaler (less sensitive to outliers) ‘log’ - Log1pScaler (applies log1p transformation) ‘l2’ - L2Normalizer (scales each feature by its L2 norm) ‘power’ - PowerTransformer (using Yeo-Johnson) ‘quantile’- QuantileTransformer (transforms features to follow a normal or uniform distribution) ‘maxabs’ - MaxAbsScaler (scales each feature by its maximum absolute value) ‘sincos’ - SinCosTransformer (converts angles to sin/cos features) Example: ‘log+standard’ applies a log transformation followed by standard scaling.

Returns:

Transformed data array. scaler: Fitted scaler object (or chained scaler) for inverse transformations.

Return type:

normalized_data

bead.src.utils.helper.numpy_to_tensor(data)[source]

Converts ndarray to torch.Tensors.

Parameters:

data (ndarray) – The data you wish to convert from ndarray to torch.Tensor.

Returns:

Your data as a tensor

Return type:

torch.Tensor

bead.src.utils.helper.save_loss_components(loss_data, component_names, suffix, save_dir='loss_outputs')[source]

This function unpacks loss_data into separate components, converts each into a NumPy array, and saves each array as a .npy file with a filename of the form: <component_name>_<suffix>.npy

Parameters:
  • loss_data (-) – a list of tuples, where each tuple contains loss components

  • component_names (-) – a list of strings naming each component in the tuple

  • suffix (-) – a string keyword to be appended (separated by ‘_’) to each filename

  • save_dir (-) – directory to save .npy files (default “loss_outputs”)

bead.src.utils.helper.save_model(model, model_path: str) None[source]

Saves the models state dictionary as a .pt file to the given path.

Parameters:
  • model (nn.Module) – The PyTorch model to save.

  • model_path (str) – String defining the models save path.

Returns:

Saved model state dictionary as .pt file.

Return type:

None

bead.src.utils.helper.select_features(jets_tensor, constituents_tensor, input_features)[source]

Process the jets_tensor and constituents_tensor based on the input_features flag.

Parameters:
  • jets_tensor (torch.Tensor) – Tensor with features [evt_id, jet_id, num_constituents, b_tagged, jet_pt, jet_eta, jet_phi_sin, jet_phi_cos, generator_id]

  • constituents_tensor (torch.Tensor) – Tensor with features [evt_id, jet_id, constit_id, b_tagged, constit_pt, constit_eta, constit_phi_sin, constit_phi_cos, generator_id]

  • input_features (str) – The flag to determine which features to select. Options: - ‘all’: return tensors as is. - ‘4momentum’: select [pt, eta, phi_sin, phi_cos, generator_id] for both. - ‘4momentum_btag’: select [b_tagged, pt, eta, phi_sin, phi_cos, generator_id] for both. - ‘pj_custom’: select everything except [evt_id, jet_id] for jets and except [evt_id, jet_id, constit_id] for constituents.

Returns:

Processed jets_tensor and constituents_tensor.

Return type:

tuple

bead.src.utils.helper.train_val_split(tensor, train_ratio)[source]

Splits a tensor into training and validation sets based on the specified train_ratio. The split is done by sampling indices randomly ensuring that the data is shuffled.

Parameters:
  • tensor (torch.Tensor) – The input tensor to be split.

  • train_ratio (float) – Proportion of data to be used for training (e.g., 0.8 for 80% training data).

Returns:

A tuple containing two tensors:
  • train_tensor: Tensor containing the training data.

  • val_tensor: Tensor containing the validation data.

Return type:

tuple

Raises:

ValueError – If train_ratio is not between 0 and 1.

bead.src.utils.loss module

class bead.src.utils.loss.BaseLoss(config)[source]

Bases: object

Base class for all loss functions. Each subclass must implement the calculate() method.

calculate(*args, **kwargs)[source]
class bead.src.utils.loss.BinaryCrossEntropyLoss(config)[source]

Bases: BaseLoss

Binary Cross Entropy Loss for binary classification tasks.

Config parameters:
  • use_logits: Boolean indicating if the predictions are raw logits (default: True).

  • reduction: Reduction method for the loss (‘mean’, ‘sum’, etc., default: ‘mean’).

Note: Not supported for full_chain mode yet

calculate(predictions, targets, mu, logvar, parameters, log_det_jacobian=0)[source]

Calculate the binary cross entropy loss.

Parameters:
  • predictions (Tensor) – Predicted outputs (logits or probabilities).

  • targets (Tensor) – Ground truth binary labels.

Returns:

The computed binary cross entropy loss.

Return type:

Tensor

class bead.src.utils.loss.ContrastiveLoss(config)[source]

Bases: BaseLoss

Contrastive loss to cluster latent vectors by event generator.

Config parameters:
  • margin: minimum distance desired between dissimilar pairs (default: 1.0)

calculate(latent, generator_flags)[source]
class bead.src.utils.loss.KLDivergenceLoss(config)[source]

Bases: BaseLoss

KL Divergence loss for VAE latent space regularization.

Uses the formula:

KL = -0.5 * sum(1 + logvar - mu^2 - exp(logvar))

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]
class bead.src.utils.loss.L1Regularization(config)[source]

Bases: BaseLoss

Computes L1 regularization over model parameters.

Config parameters:
  • weight: scaling factor for the L1 regularization (default: 1e-4)

calculate(parameters)[source]
class bead.src.utils.loss.L2Regularization(config)[source]

Bases: BaseLoss

Computes L2 regularization over model parameters.

Config parameters:
  • weight: scaling factor for the L2 regularization (default: 1e-4)

calculate(parameters)[source]
class bead.src.utils.loss.ReconstructionLoss(config)[source]

Bases: BaseLoss

Reconstruction loss for AE/VAE models. Supports both MSE and L1 losses based on configuration.

Config parameters:
  • loss_type: ‘mse’ (default) or ‘l1’

  • reduction: reduction method (default ‘mean’ or ‘sum’)

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]
class bead.src.utils.loss.VAEFlowLoss(config)[source]

Bases: BaseLoss

Loss for VAE models augmented with a normalizing flow. Includes the log_det_jacobian term from the flow transformation.

Config parameters:
  • reconstruction: dict for ReconstructionLoss config.

  • kl: dict for KLDivergenceLoss config.

  • kl_weight: weight for the KL divergence term.

  • flow_weight: weight for the log_det_jacobian term.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]
class bead.src.utils.loss.VAEFlowLossEMD(config)[source]

Bases: VAEFlowLoss

VAE loss augmented with an Earth Mover’s Distance (EMD) term.

Config parameters:
  • emd_weight: weight for the EMD term.

  • emd: dict for WassersteinLoss config.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]
In addition to the standard VAE inputs, this loss requires:
  • emd_p: first distribution tensor (e.g. a predicted histogram)

  • emd_q: second distribution tensor (e.g. a target histogram)

class bead.src.utils.loss.VAEFlowLossL1(config)[source]

Bases: VAEFlowLoss

VAE loss augmented with an L1 regularization term.

Config parameters:
  • l1_weight: weight for the L1 regularization term.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]

‘parameters’ should be a list of model parameters to regularize.

class bead.src.utils.loss.VAEFlowLossL2(config)[source]

Bases: VAEFlowLoss

VAE loss augmented with an L2 regularization term.

Config parameters:
  • l2_weight: weight for the L2 regularization term.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]

‘parameters’ should be a list of model parameters to regularize.

class bead.src.utils.loss.VAELoss(config)[source]

Bases: BaseLoss

Total loss for VAE training. Combines reconstruction loss and KL divergence loss.

Config parameters:
  • reconstruction: dict for ReconstructionLoss config.

  • kl: dict for KLDivergenceLoss config.

  • kl_weight: scaling factor for KL loss (default: 1.0)

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]
class bead.src.utils.loss.VAELossEMD(config)[source]

Bases: VAELoss

VAE loss augmented with an Earth Mover’s Distance (EMD) term.

Config parameters:
  • emd_weight: weight for the EMD term.

  • emd: dict for WassersteinLoss config.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]
In addition to the standard VAE inputs, this loss requires:
  • emd_p: first distribution tensor (e.g. a predicted histogram)

  • emd_q: second distribution tensor (e.g. a target histogram)

class bead.src.utils.loss.VAELossL1(config)[source]

Bases: VAELoss

VAE loss augmented with an L1 regularization term.

Config parameters:
  • l1_weight: weight for the L1 regularization term.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]

‘parameters’ should be a list of model parameters to regularize.

class bead.src.utils.loss.VAELossL2(config)[source]

Bases: VAELoss

VAE loss augmented with an L2 regularization term.

Config parameters:
  • l2_weight: weight for the L2 regularization term.

calculate(recon, target, mu, logvar, parameters, log_det_jacobian=0)[source]

‘parameters’ should be a list of model parameters to regularize.

class bead.src.utils.loss.WassersteinLoss(config)[source]

Bases: BaseLoss

Computes an approximation of the Earth Mover’s Distance (Wasserstein Loss) between two 1D probability distributions.

Assumes inputs are tensors of shape (batch_size, n) representing histograms or distributions.

Config parameters:
  • dim: dimension along which to compute the cumulative sum (default: 1)

calculate(p, q)[source]

bead.src.utils.normalization module

bead.src.utils.normalization.invert_normalize_constit_pj_custom(normalized_data, scalers)[source]

Inverts the normalization applied by normalize_jet_data_np_chained.

The input normalized_data is assumed to be a NumPy array of shape (N, 8) with columns:

0: event_id (unchanged) 1: jet_id (unchanged) 2: num_constituents_norm (normalized via “robust”) 3: b_tagged (unchanged) 4: jet_pt_norm (normalized via “log+standard”) 5: jet_eta_norm (normalized via “standard”) 6-7: jet_phi_sin, jet_phi_cos (normalized via “sin_cos”)

Returns:

NumPy array of shape (N, 7) with columns:

[event_id, jet_id, num_constituents, b_tagged, jet_pt, jet_eta, jet_phi]

Return type:

original_data

Note

  • The scaler for jet_pt (chain “log+standard”) is expected to invert first the StandardScaler then the Log1pScaler, so that the original jet_pt is recovered.

  • The scaler for jet_phi (chain “sin_cos”) converts the 2 columns back to the original angle using arctan2.

bead.src.utils.normalization.invert_normalize_jet_pj_custom(normalized_data, scalers)[source]

Inverts the normalization applied by normalize_jet_data_np_chained.

The input normalized_data is assumed to be a NumPy array of shape (N, 8) with columns:

0: event_id (unchanged) 1: jet_id (unchanged) 2: num_constituents_norm (normalized via “robust”) 3: b_tagged (unchanged) 4: jet_pt_norm (normalized via “log+standard”) 5: jet_eta_norm (normalized via “standard”) 6-7: jet_phi_sin, jet_phi_cos (normalized via “sin_cos”)

Returns:

NumPy array of shape (N, 7) with columns:

[event_id, jet_id, num_constituents, b_tagged, jet_pt, jet_eta, jet_phi]

Return type:

original_data

Note

  • The scaler for jet_pt (chain “log+standard”) is expected to invert first the StandardScaler then the Log1pScaler, so that the original jet_pt is recovered.

  • The scaler for jet_phi (chain “sin_cos”) converts the 2 columns back to the original angle using arctan2.

bead.src.utils.normalization.normalize_constit_pj_custom(data)[source]

Normalizes jet data for HEP analysis using a chained normalization approach.

Input data is expected as a NumPy array of shape (N, 7) with columns in the order:

0: event_id (unchanged) 1: jet_id (unchanged) 2: constit_id (unchanged) 3: b_tagged (unchanged) 4: constit_pt (to be normalized via “log+standard”) 5: constit_eta (to be normalized via “standard”) 6: constit_phi (to be normalized via “sin_cos” transformation)

The output array will have 8 columns:

[event_id, jet_id, constit_id, b_tagged, constit_pt_norm, constit_eta_norm, constit_phi_sin, constit_phi_cos]

Parameters:

data (np.ndarray) – Input array of shape (N, 7).

Returns:

Output array of shape (N, 8). scalers (dict): Dictionary containing the fitted scalers for each feature.

Return type:

normalized_data (np.ndarray)

bead.src.utils.normalization.normalize_jet_pj_custom(data)[source]

Normalizes jet data for HEP analysis using a chained normalization approach.

Input data is expected as a NumPy array of shape (N, 7) with columns in the order: 0: event_id (unchanged) 1: jet_id (unchanged) 2: num_constituents (to be normalized via “robust”) 3: b_tagged (already integer; left unchanged) 4: jet_pt (to be normalized via “log+standard”) 5: jet_eta (to be normalized via “standard”) 6: jet_phi (to be normalized via “sin_cos” transformation)

The output array will have 8 columns: [event_id, jet_id, num_constituents_norm, b_tagged, jet_pt_norm, jet_eta_norm, jet_phi_sin, jet_phi_cos]

Parameters:

data (np.ndarray) – Input array of shape (N, 7).

Returns:

Output array of shape (N, 8). scalers (dict): Dictionary containing the fitted scalers for each feature.

Return type:

normalized_data (np.ndarray)

bead.src.utils.plotting module

bead.src.utils.plotting.get_index_to_cut(column_index, cut, array)[source]
Given an array column index and a threshold, this function returns the index of the

entries not passing the threshold.

Parameters:
  • column_index (int) – The index for the column where cuts should be applied

  • cut (float) – Threshold for which values below will have the whole entry removed

  • array (np.array) – The full array to be edited

Returns:

returns the index of the rows to be removed

Return type:

_type_

bead.src.utils.plotting.loss_plot(path_to_loss_data, output_path, config)[source]

This function Plots the loss from the training and saves it

Parameters:
  • path_to_loss_data (string) – Path to file containing loss plot data generated during training

  • output_path (path) – Directory path to which the loss plot is saved

  • config (dataclass) – The config class containing attributes set in the config file

bead.src.utils.plotting.plot(output_path, config)[source]

Runs the appropriate plotting function based on the data dimension 1D or 2D

Parameters:
  • output_path (path) – The path to the project directory

  • config (dataclass) – The config class containing attributes set in the config file

bead.src.utils.plotting.plot_1D(output_path: str, config)[source]
General plotting for 1D data, for example data from a ‘.csv’ file. This function generates a pdf

document where each page contains the before/after performance of each column of the 1D data

Parameters:
  • output_path (path) – The path to the project directory

  • config (dataclass) – The config class containing attributes set in the config file

bead.src.utils.plotting.plot_2D(project_path, config)[source]
bead.src.utils.plotting.plot_2D_old(project_path, config)[source]
General plotting for 2D data, for example 2D arraysfrom computational fluid

dynamics or other image like data. This function generates a pdf document where each page contains the before/after performance of each column of the 1D data

Parameters:
  • project_path (string) – The path to the project directory

  • config (dataclass) – The config class containing attributes set in the config file

bead.src.utils.plotting.plot_box_and_whisker(names, residual, pdf)[source]

Plots Box and Whisker plots of 1D data

Parameters:
  • project_path (string) – The path to the project directory

  • config (dataclass) – The config class containing attributes set in the config file

Module contents