Large Language Models for Inter-Model Consistency Assessment
Authors/Creators
Description
The study investigates whether Large Language Models (LLMs) can support inter-model consistency assessment across heterogeneous modeling paradigms. Specifically, the evaluation focuses on consistency reasoning between AUTOSAR and ROS2 modeling frameworks within cyber-physical systems (CPS). The replication package provides all artifacts necessary to reproduce the experimental results, including dataset construction, prompt templates, baseline implementations, evaluation scripts, aggregated results, and figure generation utilities.
Reproducibility
All experiments were conducted under controlled conditions:
-
Fixed LLM model versions
-
Standardized prompt templates
-
Consistent inference parameters
-
Structured JSON output enforcement
-
Identical evaluation pipeline across models
Vendor-level aggregation is computed as the arithmetic mean across all configurations, since each configuration is evaluated on the same dataset (50 instances).
Detailed step-by-step reproduction instructions are provided in the included README file.
Research Context
This dataset enables controlled evaluation of LLM-based reasoning for multi-dimensional inter-model consistency in heterogeneous CPS modeling environments. The work contributes empirical evidence regarding:
-
The comparative performance of LLMs versus traditional heuristic baselines
-
The effect of prompting strategies on architectural reasoning
-
Vendor-level differences in semantic and behavioral consistency assessment
-
Limitations of LLMs in safety-critical modeling contexts
To our knowledge, this represents one of the first systematic empirical evaluations of LLM-based reasoning for inter-model architectural consistency in heterogeneous CPS frameworks.
Files
Replication Package.zip
Files
(1.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2c64aaf4824b176024e79da48527b516
|
1.6 MB | Preview Download |
Additional details
Dates
- Submitted
-
2026