# TUD Anomaly Detection Model (ONNX)

## Model Info
This repository contains a trained Autoencoder-based anomaly detection model developed in the context of the MLSysOps project (Machine Learning for Autonomic System Operation in the Heterogeneous Edge-Cloud Continuum), funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092912.

The model is exported in ONNX format for efficient inference on edge or cloud devices.

## Purpose
This model performs **unsupervised anomaly detection** on node/VM telemetry metrics by learning to reconstruct normal observations.

- **Input:** A feature vector of telemetry metrics (float values), normalized with Min-Max scaling.
- **Output:** The reconstructed feature vector.
- **Anomaly score:** RMSE between input and reconstruction.
- **Decision rule:** anomaly if `RMSE > threshold` (threshold stored in `model_config.json`).

## Repository Structure
The repository provides the trained model and its configuration for easy deployment.

    .
    ├── demo.py                  # Inference script (ONNXRuntime)
    ├── model/
    │   ├── autoencoder.onnx     # ONNX model
    │   └── model_config.json    # Model configuration (features, normalization, threshold)
    ├── requirements.txt         # Python dependencies
    └── README.md                # Documentation

## Training Data
The model was trained on telemetry data representing normal system behavior. The training dataset is not included in this Zenodo record unless explicitly provided in the uploaded files.

**Important:** The inference input must use the same feature ordering as the training data.

## Features Used (Feature Order)
The expected feature order (last dimension of the input tensor) is:

1. cpu_0_idle
2. cpu_0_iowait
3. cpu_0_irq
4. cpu_0_nice
5. cpu_0_softirq
6. cpu_0_steal
7. cpu_0_system
8. cpu_0_user
9. cpu_1_idle
10. cpu_1_iowait
11. cpu_1_irq
12. cpu_1_nice
13. cpu_1_softirq
14. cpu_1_steal
15. cpu_1_system
16. cpu_1_user
17. cpu_2_idle
18. cpu_2_iowait
19. cpu_2_irq
20. cpu_2_nice
21. cpu_2_softirq
22. cpu_2_steal
23. cpu_2_system
24. cpu_2_user
25. cpu_3_idle
26. cpu_3_iowait
27. cpu_3_irq
28. cpu_3_nice
29. cpu_3_softirq
30. cpu_3_steal
31. cpu_3_system
32. cpu_3_user
33. memory_used_bytes
34. node_memory_Buffers_bytes
35. node_memory_Cached_bytes
36. node_memory_MemAvailable_bytes
37. node_memory_MemFree_bytes
38. node_memory_MemTotal_bytes

(These names must match `model/model_config.json`.)

## Model Architecture
This model is a fully-connected Autoencoder with ReLU activations:

- Encoder dims: `feature_size -> int(0.75*feature_size) -> int(0.5*feature_size) -> int(0.25*feature_size) -> int(0.1*feature_size)`
- Decoder dims: symmetric back to `feature_size`

## Model Specification

### Inputs
The model accepts a single tensor representing the telemetry feature vector.

| Input name | Shape              | Type     | Description |
|-----------|--------------------|----------|-------------|
| `x`       | `[batch_size, 38]`  | float32  | Min-Max normalized feature vector |

#### Preprocessing
Min-Max normalization is applied using per-feature `min` and `max` values stored in `model/model_config.json`:

- `x_norm = (x - min) / (max - min)`
- If a feature has `max == min` (constant feature in training), normalization must avoid division by zero.
  Recommended behavior (used in the provided demo script): set that normalized feature to `0.0`.

Optionally clamp `x_norm` to `[0, 1]` if desired (configurable via `model_config.json`).

### Outputs
The ONNX graph outputs the reconstructed input vector.

| Output name       | Shape              | Type     | Description |
|------------------|--------------------|----------|-------------|
| `reconstruction` | `[batch_size, 38]`  | float32  | Reconstructed feature vector |

### Post-processing (Anomaly Detection)
The anomaly score is computed outside the ONNX graph:

- `rmse = sqrt(mean((x_norm - reconstruction)^2))` per sample
- `anomaly = 1 if rmse > threshold else 0`
- `threshold` is stored in `model/model_config.json`

## Limitations
- **Feature order & dimension are fixed:** Inputs must have exactly 38 features in the specified order.
- **Normalization is training-dependent:** Min/Max parameters are derived from the training data distribution; out-of-distribution inputs may yield unreliable anomaly scores.
- **Constant features:** Features with `max == min` require special handling during normalization (avoid division by zero).
- **ONNX output is reconstruction only:** The anomaly score/label is computed in the inference script.

## Usage Demo

### 1. Setup Environment
Create a virtual environment and install dependencies:

    python -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt

Example `requirements.txt`:

    numpy
    onnxruntime

### 2. Run Inference Script
The demo script loads the ONNX model, applies preprocessing, runs inference, and outputs RMSE + anomaly label.

Run with a CSV input:

    python demo.py --model model/autoencoder.onnx --config model/model_config.json --csv telemetry.csv --row 0

#### CSV Format Requirements
- CSV must include a header row.
- Numeric columns only (or ensure the numeric columns match the 38 features exactly).
- Column order must match the feature list in this README and `model_config.json`.

If `--csv` is not provided, the script may run on a random normalized sample (sanity check only).

## Citation

If you wish to cite **this model**, please use the citation generated by Zenodo (located in the right sidebar of this record).

## Acknowledgement & Funding

This work is part of the **MLSysOps** project, funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092912.

More information about the project is available at https://mlsysops.eu/
