bead.src.utils package
Submodules
bead.src.utils.conversion module
- bead.src.utils.conversion.calculate_jet_properties(constituents)[source]
Calculate jet pT, eta, and phi from constituent properties.
- Parameters:
constituents (list of dicts) – Each dict contains constituent properties: {‘pt’: …, ‘eta’: …, ‘phi’: …}
- Returns:
Jet properties {‘jet_pt’: …, ‘jet_eta’: …, ‘jet_phi’: …}
- Return type:
dict
- bead.src.utils.conversion.convert_csv_to_hdf5_npy_parallel(csv_file, output_prefix, out_path, file_type='h5', n_workers=4, verbose: bool = False)[source]
Convert a CSV file to HDF5 and .npy files in parallel, adding event ID (evt_id) and jet-level properties calculated from constituents.
- Parameters:
csv_file (str) – Path to the input CSV file.
output_prefix (str) – Prefix for the output files.
file_type (str) – Output file type (‘h5’ or ‘npy’).
out_path (str) – Path to save output files.
n_workers (int) – Number of parallel workers.
verbose (bool) – Print progress if True.
- bead.src.utils.conversion.process_event(evt_id, row)[source]
Process a single event, calculating jet and constituent data.
- Parameters:
evt_id (int) – Event ID.
row (pandas.Series) – Row from the DataFrame corresponding to the event.
- Returns:
(event_data, jets, constituents) for the event.
- Return type:
tuple
bead.src.utils.data_processing module
- bead.src.utils.data_processing.decode_pid(encoded_tensor_path, pid_map_path, decoded_tensor_path)[source]
Reads the encoded tensor and pid map, decodes the pids back to their original values, and saves the decoded tensor.
- bead.src.utils.data_processing.encode_pid_parallel(constits_tensor_path, encoded_tensor_path, pid_map_path)[source]
Parallelized: Reads the constituent-level tensor and encodes the pid column using categorical encoding. Saves the new tensor and the pid map for later retrieval.
- bead.src.utils.data_processing.load_data(file_path, file_type='h5', verbose: bool = False)[source]
Load data from either an HDF5 file or .npy files.
- bead.src.utils.data_processing.parallel_select_top_jets_and_constituents(jets, constituents, n_jets=3, n_constits=15, n_workers=4, verbose: bool = False)[source]
Parallelized selection of top jets and constituents.
- bead.src.utils.data_processing.preproc_inputs(paths, config, keyword, verbose: bool = False)[source]
bead.src.utils.diagnostics module
- bead.src.utils.diagnostics.c_profile(func, *args, **kwargs)[source]
Profile the function func with cProfile.
- Parameters:
func (callable) – The function to be profiled.
- Returns:
The result of the function func execution.
- Return type:
result
- bead.src.utils.diagnostics.dict_to_square_matrix(input_dict: dict) array[source]
Function changes an input dictionary into a square np.array. Adds NaNs when the dimension of a dict key is less than of the final square matrix.
- Parameters:
input_dict (dict)
- Returns:
square_matrix (np.array)
- bead.src.utils.diagnostics.pytorch_profile(f, *args, **kwargs)[source]
This function performs PyTorch profiling of CPU, GPU time and memory consumed by the function f execution.
- Parameters:
f (callable) – The function to be profiled.
- Returns:
The result of the function f execution.
- Return type:
result
bead.src.utils.ggl module
- class bead.src.utils.ggl.Config(file_type: str, parallel_workers: int, num_jets: int, num_constits: int, latent_space_size: int, normalizations: str, invert_normalizations: bool, train_size: float, epochs: int, early_stopping: bool, early_stoppin_patience: int, lr_scheduler: bool, lr_scheduler_patience: int, min_delta: int, model_name: str, input_level: str, input_features: str, model_init: str, loss_function: str, reg_param: float, lr: float, batch_size: int, test_size: float, intermittent_model_saving: bool, separate_model_saving: bool, intermittent_saving_patience: int, activation_extraction: bool, deterministic_algorithm: bool)[source]
Bases:
objectDefines a configuration dataclass
- activation_extraction: bool
- batch_size: int
- deterministic_algorithm: bool
- early_stoppin_patience: int
- early_stopping: bool
- epochs: int
- file_type: str
- input_features: str
- input_level: str
- intermittent_model_saving: bool
- intermittent_saving_patience: int
- invert_normalizations: bool
- latent_space_size: int
- loss_function: str
- lr: float
- lr_scheduler: bool
- lr_scheduler_patience: int
- min_delta: int
- model_init: str
- model_name: str
- normalizations: str
- num_constits: int
- num_jets: int
- parallel_workers: int
- reg_param: float
- separate_model_saving: bool
- test_size: float
- train_size: float
- bead.src.utils.ggl.convert_csv(paths, config, verbose: bool = False)[source]
Convert the input ‘’.csv’ into the file_type selected in the config file (‘.h5’ by default)
Separate event-level, jet-level and constituent-level data into separate datasets/files.
- Parameters:
data_path (path) – Path to the input csv files
output_path (path) – Selects base path for determining output path
config (dataClass) – Base class selecting user inputs
verbose (bool) – If True, prints out more information
- Outputs:
A ProjectName_OutputPrefix.h5 file which includes: - Event-level dataset - Jet-level dataset - Constituent-level dataset
or
A ProjectName_OutputPrefix_{data-level}.npy files which contain the same information as above, split into 3 separate files.
- bead.src.utils.ggl.create_default_config(workspace_name: str, project_name: str) str[source]
Creates a default config file for a project. :param workspace_name: Name of the workspace. :type workspace_name: str :param project_name: Name of the project. :type project_name: str
- Returns:
Default config file.
- Return type:
str
- bead.src.utils.ggl.create_new_project(workspace_name: str, project_name: str, verbose: bool = False, base_path: str = 'workspaces') None[source]
Creates a new project directory output subdirectories and config files within a workspace.
- Parameters:
workspace_name (str) – Creates a workspace (dir) for storing data and projects with this name.
project_name (str) – Creates a project (dir) for storing configs and outputs with this name.
verbose (bool, optional) – Whether to print out the progress. Defaults to False.
- bead.src.utils.ggl.get_arguments()[source]
Determines commandline arguments specified by BEAD user. Use –help to see what options are available.
Returns: .py, string, folder: .py file containing the config options, string determining what mode to run, projects directory where outputs go.
- bead.src.utils.ggl.loss_plotter(path_to_loss_data, output_path, config)[source]
Calls plotting.loss_plot()
- Parameters:
path_to_loss_data (string) – Path to the values for the loss plot
output_path (string) – Path to output the data
config (dataClass) – Base class selecting user inputs
- Returns:
Plot containing the loss curves
- Return type:
.pdf file
- bead.src.utils.ggl.plotter(output_path, config)[source]
Calls plotting.plot()
- Parameters:
output_path (string) – Path to the output directory
config (dataClass) – Base class selecting user inputs
- bead.src.utils.ggl.prepare_inputs(paths, config, verbose: bool = False)[source]
Read the input data and generate torch tensors ready to train on.
Select number of leading jets per event and number of leading constituents per jet to be used for training.
- Parameters:
paths – Dictionary of common paths used in the pipeline
config (dataClass) – Base class selecting user inputs
verbose (bool) – If True, prints out more information
- Outputs:
Tensor files which include: - Event-level dataset - [evt_id, evt_weight, met, met_phi, num_jets] - Jet-level dataset - [evt_id, jet_id, num_constituents, jet_btag, jet_pt, jet_eta, jet_phi] - Constituent-level dataset - [evt_id, jet_id, constituent_id, jet_btag, constituent_pt, constituent_eta, constituent_phi]
- bead.src.utils.ggl.run_diagnostics(project_path, verbose: bool)[source]
Calls diagnostics.diagnose()
- Parameters:
input_path (str) – path to the np.array contataining the activations values
output_path (str) – path to store the diagnostics pdf
- bead.src.utils.ggl.run_full_chain(workspace_name: str, project_name: str, paths: dict, config: dict, options: str, verbose: bool = False) None[source]
Execute a sequence of operations based on the provided options string.
- Parameters:
workspace_name – Name of the workspace for new projects
project_name – Name of the project for new projects
paths – Dictionary of file paths and directories
config – Configuration dictionary for operations
options – Underscore-separated string specifying the workflow sequence
verbose – Whether to show verbose output
Example
- run_full_chain(“my_workspace”, “my_project”, paths, config,
“newproject_convertcsv_prepareinputs_train_detect”, verbose=True)
- bead.src.utils.ggl.run_inference(paths, config, verbose: bool = False)[source]
- Main function calling the training functions, ran when –mode=train is selected.
The three functions called are: process, ggl.mode_init and training.train.
- Parameters:
paths (dictionary) – Dictionary of common paths used in the pipeline
config (dataClass) – Base class selecting user inputs
verbose (bool) – If True, prints out more information
- bead.src.utils.ggl.run_plots(output_path, config, verbose: bool)[source]
- Main function calling the two plotting functions, ran when –mode=plot is selected.
The two main functions this calls are: ggl.plotter and ggl.loss_plotter
- Parameters:
prepare_inputt_path (string) – Selects base path for determining output path
config (dataClass) – Base class selecting user inputs
verbose (bool) – If True, prints out more information
- bead.src.utils.ggl.run_training(paths, config, verbose: bool = False)[source]
- Main function calling the training functions, ran when –mode=train is selected.
The three functions called are: ‘data_processing.preproc_inputs’ and training.train.
- Parameters:
paths (dictionary) – Dictionary of common paths used in the pipeline
config (dataClass) – Base class selecting user inputs
verbose (bool) – If True, prints out more information
bead.src.utils.helper module
- class bead.src.utils.helper.ChainedScaler(scalers)[source]
Bases:
BaseEstimator,TransformerMixinChains a list of scaler transformations. The transformation is applied sequentially (in the order provided) and the inverse transformation is applied in reverse order.
- class bead.src.utils.helper.EarlyStopping(patience: int, min_delta: float)[source]
Bases:
objectClass to perform early stopping during model training. .. attribute:: patience
The number of epochs to wait before stopping the training process if the validation loss doesn’t improve.
- type:
int
- min_delta
The minimum difference between the new loss and the previous best loss for the new loss to be considered an improvement.
- Type:
float
- counter
Counts the number of times the validation loss hasn’t improved.
- Type:
int
- best_loss
The best validation loss observed so far.
- Type:
float
- early_stop
Flag that indicates whether early stopping criteria have been met.
- Type:
bool
- class bead.src.utils.helper.L2Normalizer[source]
Bases:
BaseEstimator,TransformerMixinL2 normalization per feature of data
- class bead.src.utils.helper.LRScheduler(optimizer, patience, min_lr=1e-06, factor=0.5)[source]
Bases:
objectA learning rate scheduler that adjusts the learning rate of an optimizer based on the training loss.
- Parameters:
optimizer (torch.optim.Optimizer) – The optimizer whose learning rate will be adjusted.
patience (int) – The number of epochs with no improvement in training loss after which the learning rate will be reduced.
min_lr (float, optional) – The minimum learning rate that can be reached (default: 1e-6).
factor (float, optional) – The factor by which the learning rate will be reduced (default: 0.1).
- lr_scheduler
The PyTorch learning rate scheduler that actually performs the adjustments.
- Type:
torch.optim.lr_scheduler.ReduceLROnPlateau
- Example usage:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01) lr_scheduler = LRScheduler(optimizer, patience=3, min_lr=1e-6, factor=0.5) for epoch in range(num_epochs): train_loss = train(model, train_data_loader) lr_scheduler(train_loss) # …
- class bead.src.utils.helper.Log1pScaler[source]
Bases:
BaseEstimator,TransformerMixinLog(1+x) transformer for positive-skewed HEP features
- class bead.src.utils.helper.SinCosTransformer[source]
Bases:
BaseEstimator,TransformerMixinTransforms an angle (in radians) into two features: [sin(angle), cos(angle)]. Inverse transformation uses arctan2.
- bead.src.utils.helper.add_sig_bkg_label(tensors: tuple, label: str) tuple[source]
Adds a new feature to the last dimension of each tensor in the tuple. The new feature is filled with 0 for “bkg” and 1 for “sig”.
- Parameters:
tensors – A tuple of three tensors (events, jets, constituents).
label – A string, either “bkg” or “sig”, to determine the value of the new feature.
- Returns:
A tuple of the three tensors with the new feature added to the last dimension.
- bead.src.utils.helper.calculate_in_shape(data, config)[source]
Calculates the input shapes for the models based on the data.
- Parameters:
data (ndarray) – The data you wish to calculate the input shapes for.
config (dataClass) – Base class selecting user inputs.
- Returns:
A tuple containing the input shapes for the models.
- Return type:
tuple
- bead.src.utils.helper.call_forward(model, inputs)[source]
Calls the forward method of the given object. If the return value is not a tuple, packs it into a tuple.
- Parameters:
model – An object that has a forward method.
inputs – The input data to pass to the model.
- Returns:
A tuple containing the result(s) of the forward method.
- bead.src.utils.helper.convert_to_tensor(data)[source]
Converts ndarray to torch.Tensors.
- Parameters:
data (ndarray) – The data you wish to convert from ndarray to torch.Tensor.
- Returns:
Your data as a tensor
- Return type:
torch.Tensor
- bead.src.utils.helper.create_datasets(events_train, jets_train, constituents_train, events_val, jets_val, constituents_val, events_train_label, jets_train_label, constituents_train_label, events_val_label, jets_val_label, constituents_val_label)[source]
- bead.src.utils.helper.data_label_split(data)[source]
Splits the data into features and labels.
- Parameters:
data (ndarray) – The data you wish to split into features and labels.
- Returns:
- A tuple containing two ndarrays:
data: The features of the data.
labels: The labels of the data.
- Return type:
tuple
- bead.src.utils.helper.decoder_saver(model, model_path: str) None[source]
Saves the Decoder state dictionary as a .pt file to the given path
- Parameters:
model (nn.Module) – The PyTorch model to save.
model_path (str) – String defining the models save path.
- Returns:
Saved decoder state dictionary as .pt file.
- Return type:
None
- bead.src.utils.helper.detach_device(tensor)[source]
Detaches a given tensor to ndarray
- Parameters:
tensor (torch.Tensor) – The PyTorch tensor one wants to convert to a ndarray
- Returns:
Converted torch.Tensor to ndarray
- Return type:
ndarray
- bead.src.utils.helper.encoder_saver(model, model_path: str) None[source]
Saves the Encoder state dictionary as a .pt file to the given path
- Parameters:
model (nn.Module) – The PyTorch model to save.
model_path (str) – String defining the models save path.
- Returns:
Saved encoder state dictionary as .pt file.
- Return type:
None
- bead.src.utils.helper.get_device()[source]
- Returns the appropriate processing device. IF cuda is available it returns “cuda:0”
Otherwise it returns “cpu”
- Returns:
Device string, either “cpu” or “cuda:0”
- Return type:
_type_
- bead.src.utils.helper.get_loss(loss_function: str)[source]
Returns the loss_object based on the string provided.
- Parameters:
loss_function (str) – The loss function you wish to use. Options include: - ‘mse’: Mean Squared Error - ‘bce’: Binary Cross Entropy - ‘mae’: Mean Absolute Error - ‘huber’: Huber Loss - ‘l1’: L1 Loss - ‘l2’: L2 Loss - ‘smoothl1’: Smooth L1 Loss
- Returns:
The loss function object
- Return type:
class
- bead.src.utils.helper.get_optimizer(optimizer_name, parameters, lr)[source]
Returns a PyTorch optimizer configured with optimal arguments for training a large VAE.
- Parameters:
optimizer_name (str) – One of “adam”, “adamw”, “rmsprop”, “sgd”, “radam”, “adagrad”.
parameters (iterable) – The parameters (or parameter groups) of your model.
lr (float) – The learning rate for the optimizer.
- Returns:
An instantiated optimizer with specified hyperparameters.
- Return type:
torch.optim.Optimizer
- Raises:
ValueError – If an unsupported optimizer name is provided.
- bead.src.utils.helper.invert_normalize_data(normalized_data, scaler)[source]
Inverts a chained normalization transformation.
This function accepts normalized data (for example, the output of a VAE’s preprocessed input) and the scaler (or ChainedScaler) that was used to perform the forward transformation. It then returns the original data by calling the scaler’s inverse_transform method.
- Parameters:
normalized_data (np.ndarray) – The transformed data array.
scaler – The scaler object (or a ChainedScaler instance) used for the forward transformation, which must implement an inverse_transform method.
- Returns:
The data mapped back to its original scale.
- Return type:
np.ndarray
- bead.src.utils.helper.load_augment_tensors(folder_path, keyword)[source]
Searches through the specified folder for all ‘.pt’ files whose names contain the specified keyword (e.g., ‘bkg_train’, ‘bkg_test’, or ‘sig_test’). Files are then categorized by whether their filename contains one of the three substrings: ‘jets’, ‘events’, or ‘constituents’.
For ‘bkg_train’, each file must contain one of the generator names: ‘herwig’, ‘pythia’, or ‘sherpa’. For each file, the tensor is loaded and a new feature is appended along the last dimension: - 0 for files containing ‘herwig’ - 1 for files containing ‘pythia’ - 2 for files containing ‘sherpa’
For ‘bkg_test’ and ‘sig_test’, the appended new feature is filled with -1, since generator info is not available at test time.
Finally, for each category the resulting tensors are concatenated along axis=0.
- Parameters:
folder_path (str) – The path to the folder to search.
keyword (str) – The keyword to filter files (e.g., ‘bkg_train’, ‘bkg_test’, or ‘sig_test’).
- Returns:
- A tuple of three PyTorch tensors: (jets_tensor, events_tensor, constituents_tensor)
corresponding to the concatenated tensors for each category.
- Return type:
tuple
- Raises:
ValueError – If any category does not have at least one file for each generator type. The error message is: “required files not found. please run the –mode convert_csv and prepare inputs before retrying”
- bead.src.utils.helper.load_model(model_path: str, in_shape, config)[source]
Loads the state dictionary of the trained model into a model variable. This variable is then used for passing data through the encoding and decoding functions.
- Parameters:
model_object (object) – Object with the models attributes
model_path (str) – Path to model
n_features (int) – Input dimension size
z_dim (int) – Latent space size
Returns: nn.Module: Returns a model object with the attributes of the model class, with the selected state dictionary loaded into it.
- bead.src.utils.helper.load_tensors(folder_path, keyword='sig_test')[source]
Searches through the specified folder for all ‘.pt’ files containing the given keyword in their names. Categorizes these files based on the presence of ‘jets’, ‘events’, or ‘constituents’ in their filenames, loads them into PyTorch tensors, concatenates them along axis=0, and returns the resulting tensors.
- Parameters:
folder_path (str) – The path to the folder to search.
keyword (str) – The keyword to filter files (‘bkg_train’, ‘bkg_test’, or ‘sig_test’).
- Returns:
A tuple containing three PyTorch tensors: (jets_tensor, events_tensor, constituents_tensor).
- Return type:
tuple
- Raises:
ValueError – If any specific category (‘jets’, ‘events’, ‘constituents’) has no matching files. The error message is: “Required files not found. Please run the –mode convert_csv and prepare inputs before retrying.”
- bead.src.utils.helper.model_init(in_shape, config)[source]
Initializing the models attributes to a model_object variable.
- Parameters:
model_name (str) – The name of the model you wish to initialize. This should correspond to what your Model name.
init (str) – The initialization method you wish to use (Xavier support currently). Default is None.
config (dataClass) – Base class selecting user inputs.
- Returns:
Object with the models class attributes
- Return type:
class
- bead.src.utils.helper.normalize_data(data, normalization_type)[source]
Normalizes jet data for VAE-based anomaly detection.
- Parameters:
data – 2D numpy array (n_jets, n_features)
normalization_type – A string indicating the normalization method(s). It can be a single method or a chain of methods separated by ‘+’. Valid options include: ‘minmax’ - MinMaxScaler (scales features to [0,1]) ‘standard’- StandardScaler (zero mean, unit variance) ‘robust’ - RobustScaler (less sensitive to outliers) ‘log’ - Log1pScaler (applies log1p transformation) ‘l2’ - L2Normalizer (scales each feature by its L2 norm) ‘power’ - PowerTransformer (using Yeo-Johnson) ‘quantile’- QuantileTransformer (transforms features to follow a normal or uniform distribution) ‘maxabs’ - MaxAbsScaler (scales each feature by its maximum absolute value) ‘sincos’ - SinCosTransformer (converts angles to sin/cos features) Example: ‘log+standard’ applies a log transformation followed by standard scaling.
- Returns:
Transformed data array. scaler: Fitted scaler object (or chained scaler) for inverse transformations.
- Return type:
normalized_data
- bead.src.utils.helper.numpy_to_tensor(data)[source]
Converts ndarray to torch.Tensors.
- Parameters:
data (ndarray) – The data you wish to convert from ndarray to torch.Tensor.
- Returns:
Your data as a tensor
- Return type:
torch.Tensor
- bead.src.utils.helper.save_loss_components(loss_data, component_names, suffix, save_dir='loss_outputs')[source]
This function unpacks loss_data into separate components, converts each into a NumPy array, and saves each array as a .npy file with a filename of the form: <component_name>_<suffix>.npy
- Parameters:
loss_data (-) – a list of tuples, where each tuple contains loss components
component_names (-) – a list of strings naming each component in the tuple
suffix (-) – a string keyword to be appended (separated by ‘_’) to each filename
save_dir (-) – directory to save .npy files (default “loss_outputs”)
- bead.src.utils.helper.save_model(model, model_path: str) None[source]
Saves the models state dictionary as a .pt file to the given path.
- Parameters:
model (nn.Module) – The PyTorch model to save.
model_path (str) – String defining the models save path.
- Returns:
Saved model state dictionary as .pt file.
- Return type:
None
- bead.src.utils.helper.select_features(jets_tensor, constituents_tensor, input_features)[source]
Process the jets_tensor and constituents_tensor based on the input_features flag.
- Parameters:
jets_tensor (torch.Tensor) – Tensor with features [evt_id, jet_id, num_constituents, b_tagged, jet_pt, jet_eta, jet_phi_sin, jet_phi_cos, generator_id]
constituents_tensor (torch.Tensor) – Tensor with features [evt_id, jet_id, constit_id, b_tagged, constit_pt, constit_eta, constit_phi_sin, constit_phi_cos, generator_id]
input_features (str) – The flag to determine which features to select. Options: - ‘all’: return tensors as is. - ‘4momentum’: select [pt, eta, phi_sin, phi_cos, generator_id] for both. - ‘4momentum_btag’: select [b_tagged, pt, eta, phi_sin, phi_cos, generator_id] for both. - ‘pj_custom’: select everything except [evt_id, jet_id] for jets and except [evt_id, jet_id, constit_id] for constituents.
- Returns:
Processed jets_tensor and constituents_tensor.
- Return type:
tuple
- bead.src.utils.helper.train_val_split(tensor, train_ratio)[source]
Splits a tensor into training and validation sets based on the specified train_ratio. The split is done by sampling indices randomly ensuring that the data is shuffled.
- Parameters:
tensor (torch.Tensor) – The input tensor to be split.
train_ratio (float) – Proportion of data to be used for training (e.g., 0.8 for 80% training data).
- Returns:
- A tuple containing two tensors:
train_tensor: Tensor containing the training data.
val_tensor: Tensor containing the validation data.
- Return type:
tuple
- Raises:
ValueError – If train_ratio is not between 0 and 1.
bead.src.utils.loss module
- class bead.src.utils.loss.BaseLoss(config)[source]
Bases:
objectBase class for all loss functions. Each subclass must implement the calculate() method.
- class bead.src.utils.loss.BinaryCrossEntropyLoss(config)[source]
Bases:
BaseLossBinary Cross Entropy Loss for binary classification tasks.
- Config parameters:
use_logits: Boolean indicating if the predictions are raw logits (default: True).
reduction: Reduction method for the loss (‘mean’, ‘sum’, etc., default: ‘mean’).
Note: Not supported for full_chain mode yet
- calculate(predictions, targets, mu, logvar, parameters, log_det_jacobian=0)[source]
Calculate the binary cross entropy loss.
- Parameters:
predictions (Tensor) – Predicted outputs (logits or probabilities).
targets (Tensor) – Ground truth binary labels.
- Returns:
The computed binary cross entropy loss.
- Return type:
Tensor
- class bead.src.utils.loss.ContrastiveLoss(config)[source]
Bases:
BaseLossContrastive loss to cluster latent vectors by event generator.
- Config parameters:
margin: minimum distance desired between dissimilar pairs (default: 1.0)
- class bead.src.utils.loss.KLDivergenceLoss(config)[source]
Bases:
BaseLossKL Divergence loss for VAE latent space regularization.
- Uses the formula:
KL = -0.5 * sum(1 + logvar - mu^2 - exp(logvar))
- class bead.src.utils.loss.L1Regularization(config)[source]
Bases:
BaseLossComputes L1 regularization over model parameters.
- Config parameters:
weight: scaling factor for the L1 regularization (default: 1e-4)
- class bead.src.utils.loss.L2Regularization(config)[source]
Bases:
BaseLossComputes L2 regularization over model parameters.
- Config parameters:
weight: scaling factor for the L2 regularization (default: 1e-4)
- class bead.src.utils.loss.ReconstructionLoss(config)[source]
Bases:
BaseLossReconstruction loss for AE/VAE models. Supports both MSE and L1 losses based on configuration.
- Config parameters:
loss_type: ‘mse’ (default) or ‘l1’
reduction: reduction method (default ‘mean’ or ‘sum’)
- class bead.src.utils.loss.VAEFlowLoss(config)[source]
Bases:
BaseLossLoss for VAE models augmented with a normalizing flow. Includes the log_det_jacobian term from the flow transformation.
- Config parameters:
reconstruction: dict for ReconstructionLoss config.
kl: dict for KLDivergenceLoss config.
kl_weight: weight for the KL divergence term.
flow_weight: weight for the log_det_jacobian term.
- class bead.src.utils.loss.VAEFlowLossEMD(config)[source]
Bases:
VAEFlowLossVAE loss augmented with an Earth Mover’s Distance (EMD) term.
- Config parameters:
emd_weight: weight for the EMD term.
emd: dict for WassersteinLoss config.
- class bead.src.utils.loss.VAEFlowLossL1(config)[source]
Bases:
VAEFlowLossVAE loss augmented with an L1 regularization term.
- Config parameters:
l1_weight: weight for the L1 regularization term.
- class bead.src.utils.loss.VAEFlowLossL2(config)[source]
Bases:
VAEFlowLossVAE loss augmented with an L2 regularization term.
- Config parameters:
l2_weight: weight for the L2 regularization term.
- class bead.src.utils.loss.VAELoss(config)[source]
Bases:
BaseLossTotal loss for VAE training. Combines reconstruction loss and KL divergence loss.
- Config parameters:
reconstruction: dict for ReconstructionLoss config.
kl: dict for KLDivergenceLoss config.
kl_weight: scaling factor for KL loss (default: 1.0)
- class bead.src.utils.loss.VAELossEMD(config)[source]
Bases:
VAELossVAE loss augmented with an Earth Mover’s Distance (EMD) term.
- Config parameters:
emd_weight: weight for the EMD term.
emd: dict for WassersteinLoss config.
- class bead.src.utils.loss.VAELossL1(config)[source]
Bases:
VAELossVAE loss augmented with an L1 regularization term.
- Config parameters:
l1_weight: weight for the L1 regularization term.
- class bead.src.utils.loss.VAELossL2(config)[source]
Bases:
VAELossVAE loss augmented with an L2 regularization term.
- Config parameters:
l2_weight: weight for the L2 regularization term.
- class bead.src.utils.loss.WassersteinLoss(config)[source]
Bases:
BaseLossComputes an approximation of the Earth Mover’s Distance (Wasserstein Loss) between two 1D probability distributions.
Assumes inputs are tensors of shape (batch_size, n) representing histograms or distributions.
- Config parameters:
dim: dimension along which to compute the cumulative sum (default: 1)
bead.src.utils.normalization module
- bead.src.utils.normalization.invert_normalize_constit_pj_custom(normalized_data, scalers)[source]
Inverts the normalization applied by normalize_jet_data_np_chained.
- The input normalized_data is assumed to be a NumPy array of shape (N, 8) with columns:
0: event_id (unchanged) 1: jet_id (unchanged) 2: num_constituents_norm (normalized via “robust”) 3: b_tagged (unchanged) 4: jet_pt_norm (normalized via “log+standard”) 5: jet_eta_norm (normalized via “standard”) 6-7: jet_phi_sin, jet_phi_cos (normalized via “sin_cos”)
- Returns:
- NumPy array of shape (N, 7) with columns:
[event_id, jet_id, num_constituents, b_tagged, jet_pt, jet_eta, jet_phi]
- Return type:
original_data
Note
The scaler for jet_pt (chain “log+standard”) is expected to invert first the StandardScaler then the Log1pScaler, so that the original jet_pt is recovered.
The scaler for jet_phi (chain “sin_cos”) converts the 2 columns back to the original angle using arctan2.
- bead.src.utils.normalization.invert_normalize_jet_pj_custom(normalized_data, scalers)[source]
Inverts the normalization applied by normalize_jet_data_np_chained.
- The input normalized_data is assumed to be a NumPy array of shape (N, 8) with columns:
0: event_id (unchanged) 1: jet_id (unchanged) 2: num_constituents_norm (normalized via “robust”) 3: b_tagged (unchanged) 4: jet_pt_norm (normalized via “log+standard”) 5: jet_eta_norm (normalized via “standard”) 6-7: jet_phi_sin, jet_phi_cos (normalized via “sin_cos”)
- Returns:
- NumPy array of shape (N, 7) with columns:
[event_id, jet_id, num_constituents, b_tagged, jet_pt, jet_eta, jet_phi]
- Return type:
original_data
Note
The scaler for jet_pt (chain “log+standard”) is expected to invert first the StandardScaler then the Log1pScaler, so that the original jet_pt is recovered.
The scaler for jet_phi (chain “sin_cos”) converts the 2 columns back to the original angle using arctan2.
- bead.src.utils.normalization.normalize_constit_pj_custom(data)[source]
Normalizes jet data for HEP analysis using a chained normalization approach.
- Input data is expected as a NumPy array of shape (N, 7) with columns in the order:
0: event_id (unchanged) 1: jet_id (unchanged) 2: constit_id (unchanged) 3: b_tagged (unchanged) 4: constit_pt (to be normalized via “log+standard”) 5: constit_eta (to be normalized via “standard”) 6: constit_phi (to be normalized via “sin_cos” transformation)
- The output array will have 8 columns:
[event_id, jet_id, constit_id, b_tagged, constit_pt_norm, constit_eta_norm, constit_phi_sin, constit_phi_cos]
- Parameters:
data (np.ndarray) – Input array of shape (N, 7).
- Returns:
Output array of shape (N, 8). scalers (dict): Dictionary containing the fitted scalers for each feature.
- Return type:
normalized_data (np.ndarray)
- bead.src.utils.normalization.normalize_jet_pj_custom(data)[source]
Normalizes jet data for HEP analysis using a chained normalization approach.
Input data is expected as a NumPy array of shape (N, 7) with columns in the order: 0: event_id (unchanged) 1: jet_id (unchanged) 2: num_constituents (to be normalized via “robust”) 3: b_tagged (already integer; left unchanged) 4: jet_pt (to be normalized via “log+standard”) 5: jet_eta (to be normalized via “standard”) 6: jet_phi (to be normalized via “sin_cos” transformation)
The output array will have 8 columns: [event_id, jet_id, num_constituents_norm, b_tagged, jet_pt_norm, jet_eta_norm, jet_phi_sin, jet_phi_cos]
- Parameters:
data (np.ndarray) – Input array of shape (N, 7).
- Returns:
Output array of shape (N, 8). scalers (dict): Dictionary containing the fitted scalers for each feature.
- Return type:
normalized_data (np.ndarray)
bead.src.utils.plotting module
- bead.src.utils.plotting.get_index_to_cut(column_index, cut, array)[source]
- Given an array column index and a threshold, this function returns the index of the
entries not passing the threshold.
- Parameters:
column_index (int) – The index for the column where cuts should be applied
cut (float) – Threshold for which values below will have the whole entry removed
array (np.array) – The full array to be edited
- Returns:
returns the index of the rows to be removed
- Return type:
_type_
- bead.src.utils.plotting.loss_plot(path_to_loss_data, output_path, config)[source]
This function Plots the loss from the training and saves it
- Parameters:
path_to_loss_data (string) – Path to file containing loss plot data generated during training
output_path (path) – Directory path to which the loss plot is saved
config (dataclass) – The config class containing attributes set in the config file
- bead.src.utils.plotting.plot(output_path, config)[source]
Runs the appropriate plotting function based on the data dimension 1D or 2D
- Parameters:
output_path (path) – The path to the project directory
config (dataclass) – The config class containing attributes set in the config file
- bead.src.utils.plotting.plot_1D(output_path: str, config)[source]
- General plotting for 1D data, for example data from a ‘.csv’ file. This function generates a pdf
document where each page contains the before/after performance of each column of the 1D data
- Parameters:
output_path (path) – The path to the project directory
config (dataclass) – The config class containing attributes set in the config file
- bead.src.utils.plotting.plot_2D_old(project_path, config)[source]
- General plotting for 2D data, for example 2D arraysfrom computational fluid
dynamics or other image like data. This function generates a pdf document where each page contains the before/after performance of each column of the 1D data
- Parameters:
project_path (string) – The path to the project directory
config (dataclass) – The config class containing attributes set in the config file