# **CVE Metadata Elements, Exploitability, & CWE Analytical Framework**
> 
> ### `Reference Links to see`: ***CVE sources , User-Story Resources , Official Links***
> > 1. [**`NVD` ( *NIST* )** - *nvd.nist.gov*](https://nvd.nist.gov/) ,
> > 2. [**`RedHat`** - *docs.redhat.com/en/documentation/red_hat_security_data_api/1.0*](https://docs.redhat.com/en/documentation/red_hat_security_data_api/1.0/html/red_hat_security_data_api/index) ,
> > 3. [**`Debian`** - *security-tracker.debian.org*](https://security-tracker.debian.org/tracker/#:~:text=Search%20for%20package%20or%20bug%20name),
> > 4. [**`Suse`** - *suse.com/support/security*](https://www.suse.com/support/security/#:~:text=Security%20updates%20by,in%20CSAF%20format) ,
> > 5. [**`Amzn Linux`** - *explore.alas.aws.amazon.com*](https://explore.alas.aws.amazon.com/) ,
> > 6. [**`OSV.dev`** - *github.com/google/osv.dev*](https://github.com/google/osv.dev) ,
> > 7. [**`CISA.gov`** - *cisa.gov/known-exploited-vulnerabilities-catalog*](https://www.cisa.gov/known-exploited-vulnerabilities-catalog) ,
> > 8. [**`Fedora` (** *Script to pull all package names & meta-data* **)** : *pagure.io/fedora-packages-static*](https://pagure.io/fedora-packages-static) .
> >
> ---
> >
> > 1. [**My (** *Keerthana's* **) `CVE user Story`** - *github.com/keerthanap8898/CveToad/tree/main/CVE-Consumer_User-Story.md*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-Consumer_User-Story.md) ,
> > 2. [**CVE `User-story_Description`** - *github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md) ,
> > 3. [**`CVE Meta-data`** *img* - *github.com/keerthanap8898/CveToad/blob/main/Resources/Images/CVE_Meta-data_Framework_Table.jpg*](https://github.com/keerthanap8898/CveToad/blob/main/Resources/Images/CVE_Meta-data_Framework_Table.jpg) . 
> >
> ---
> >
> > 1. [**CVE.org (** *Github* **)** - *github.com/CVEProject*](https://github.com/CVEProject) ,
> > 2. [**CVE Schema** - *github.com/CVEProject/cve-schema*](https://github.com/CVEProject/cve-schema) ,
> > 3. [**Official CVE list** - *github.com/CVEproject/cvelistv5*](https://github.com/CVEProject/cvelistV5) .
> >
> ---
> >
> #### **CVE / CWE Working Group**s **(** *WG* **) & SIG**s -
> >
> > 1. [**`MITRE`** : ***`CWE`***- ***WG**s & **SIG**s* ( *Special Interest Groups* )](https://cwe.mitre.org/community/working_groups.html) ,
> > 2. [**`CVE.org`** : *all **`Working Groups`***](https://www.cve.org/programorganization/workinggroups) :
> >    - **⒜**. [*`groups.io`*] ***`Main Page`*** : [*cve-cwe-programs.**groups.io***](https://cve-cwe-programs.groups.io/g/main) ,
> >    - **⒝**. [*`groups.io`*] ***`Sub-groups`*** : [*cve-cwe-programs.groups.io/g/main/**subgroups***](https://cve-cwe-programs.groups.io/g/main/subgroups) ,
> >    - **⒞**. [*`groups.io`*] ***`Consumer-WG`*** : [*cve-cwe-programs.groups.io/g/**ConsumerWG***](https://cve-cwe-programs.groups.io/g/ConsumerWG) ,
> >    - **⒟**. [*`cve.org`*] ***`Automation-WG`*** :
> >       - **➀**. [*github.com/CveProject/**automation-working-group***](https://github.com/CVEProject/automation-working-group) ,
> >       - **➁**. [*cve-cwe-programs.**groups.io/g/AWG***](https://cve-cwe-programs.groups.io/g/AWG) .
> >
---

>
> ## **`Table of Contents`**
> 
> - #### [1. **`Field Map` (** *`.json` tree* **)**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#1-field-map-jsonpath--json-pointer)  
> - #### [2. **`Extraction & Normalization`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#2-extraction--normalization-recipe)  
> - #### [3. **`Exploitability Metrics`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#3-exploitability-metrics) 
> - #### [4. **`CWE Normalization` (** *Common Weakness Enumeration* **)**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#4-cwe-common-weakness-enumeration-normalization) *- see*[***`CWE.mitre.org`***](https://cwe.mitre.org) *for more context.*
> - #### [5. **`Correlating CWE & CVSS`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#5-integration-of-cwe-and-exploitability-in-analytical-pipelines)  
> - #### [6. **`Statistical Outliers`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#6-statistical-and-outlier-detection-framework)  
> - #### [7. **`Upstream Data Quality`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#7-upstream-data-quality-recommendations)  
> - #### [8. **`Validation`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#8-extraction-and-validation-specification)
> - #### [9. **`License`**](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#9-License)
>
> ---
>
> ### **CVE Meta-data Framework Table**:
> #### - [**`See schema`** - *`github.com/CVEProject/cve-schema/blob/main/schema/CVE_Record_Format.json`*](https://github.com/CVEProject/cve-schema/blob/main/schema/CVE_Record_Format.json)
>
>  > ---
>  > ![CVE_Meta-data_Framework_Table](https://github.com/keerthanap8898/CveToad/blob/main/Resources/Images/CVE_Meta-data_Framework_Table.jpg)
>  > ---
>
> [*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)
>
> ---


## **1. Field Map (JSONPath / JSON Pointer)**


The CVE JSON 5.x schema defines structured containers for each vulnerability record. The following paths identify the core metadata elements that are most relevant for cross-source normalization & analytical consistency.

### **Aim**
To document & standardize the key fields within CVE JSON 5.x necessary for multi-source ingestion, transformation, & downstream analytical modeling.


| Element | JSONPath / JSON Pointer | Description |
|----------|------------------------|--------------|
| **Description(s)** | $`.containers.cna.descriptions[*].value` (+ `.lang`) \| $`.containers.adp[*].descriptions[*]` | Textual narrative explaining the vulnerability; includes language codes (BCP-47). |
| **Affected Software & Platforms** | $`.containers.cna.affected[*]` | Lists impacted products, vendors, versions, modules, & platforms. |
| **Source / CNA / Credits** | $`.containers.cna.providerMetadata`, $`.containers.cna.credits[*]` | Metadata describing the CVE issuing CNA & contributors. |
| **Severity & Metrics** | $`.containers.cna.metrics[*]` & $`.containers.adp[*].metrics[*]` | CVSS objects (v2/v3/v3.1/v4) defining severity, vector strings, & scores. |
| **CWE / Problem Type** | $`.containers.cna.problemTypes[*].descriptions[*].cweId` | CWE identifiers & classifications of vulnerability type. |
| **References** | $`.containers.cna.references[*]` | Typed URLs referencing advisories, patches, or analyses. |
| **Languages** | $`.containers.cna.descriptions[*].lang` | Declared language codes for multilingual description support. |


This schema alignment ensures that all analytical & preprocessing routines have deterministic access to the same structural anchors, enabling automated validation, enrichment, & normalization pipelines for CVE analytics.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **2. Extraction & Normalization Recipe**


A normalization pipeline converts heterogeneous CVE data from multiple CNAs (e.g., NVD, Red Hat, Debian) into structured relational or graph representations.

### **Aim**
To define canonical data tables supporting longitudinal analysis & machine reasoning over CVE datasets.


- **Primary Table**
  - `cve_core(id, state, published, lastModified, cna_name)`
- **Satellite Tables**
  - `description(id, lang, value, char_len, token_len, html_tag_count)`
  - `affected(id, vendor, product, versions[], platforms[], ecosystem?, purl?)`
  - `metric(id, source_container, system, version, base_score, base_severity, vector)`  
    *(optionally include `temporal_score`, `temporal_severity`, `environmental_score`, `environmental_severity` if present)*
  - `problemtype(id, cwe_id, text)`
  - `ref(id, type, url, normalized_id)`


This architecture provides a reproducible foundation for analytic modeling, traceability, & metadata-level comparison across CNAs, while remaining backward-compatible with NVD’s simplified feeds.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **3. Exploitability Metrics**


Exploitability metrics in CVE JSON 5.x quantify how easily a vulnerability can be exploited & the potential impact if successfully leveraged. These are encapsulated within the `metrics` arrays provided by CNAs & ADPs.

### **Aim**
To normalize exploitability indicators across multiple CVSS versions (v2, v3, v3.1, v4) & ensure internal consistency of vectors, scores, & enumerations for comparative analytics.



#### **Field Map**
| Field | JSONPath / JSON Pointer | Description |
|--------|-------------------------|--------------|
| **CVSS System** | $`.containers.cna.metrics[*].cvssV2 \| cvssV3 \| cvssV3_1 \| cvssV4` | Identifies the CVSS scoring framework version present in a metric object. |
| **Base Score** | $`.containers.cna.metrics[*].cvssV3_1.baseScore` *(and analogous for v2/v3/v4)* | Core severity score (0.0–10.0). |
| **Base Severity** | $`.containers.cna.metrics[*].cvssV3_1.baseSeverity` | Severity label derived from the base score. |
| **Vector String** | $`.containers.cna.metrics[*].cvssV3_1.vectorString` | Encoded representation of attack/impact metrics (AV, AC, PR, UI, S, C, I, A for v3.x). |
| **Temporal Scores (optional)** | $`.containers.cna.metrics[*].cvssV3_1.temporalScore`, `...temporalSeverity` | Temporal adjustments (E, RL, RC are validated against allowed enums; not usually encoded in the vector string for v3.x). |
| **Environmental Scores (optional)** | $`.containers.cna.metrics[*].cvssV3_1.environmentalScore`, `...environmentalSeverity` | Context-specific adjustments to base score based on environment requirements & modified metrics. |

#### **Validation Rules**
1. **Cross-Version Integrity:** Ensure the metric object name matches the vector’s prefix (`CVSS:3.1/` for v3.1, `CVSS:4.0/` for v4).  
2. **Vector Sanity:** Validate AV/AC/PR/UI/S/C/I/A (v3.x) or the corresponding v2/v4 metric enums; reject invalid tokens or ordering.  
3. **Temporal/Environmental Consistency:** If temporal/environmental scores are present, verify allowed enumerations (E, RL, RC; & environmental modifiers) & ensure scores are consistent with CVSS rules.  
4. **Cross-Source Deviation:** Δ ≥ 1.5 between two sources’ **baseScore** for the same CVE → flag for review (possible provenance or interpretation difference).

#### **Derived Attributes**
- **Exploitability Index (EI):** Derived from parsed base vector components (e.g., mapping AV/AC/PR/UI to their standard numeric factors rather than relying on any non-standard `exploitabilityScore` field).  
- **Confidence Weight:** Based on presence/validity of temporal/environmental fields & metric provenance.  
- **Cross-Vector Drift:** Difference between base score & temporal/environmental scores when provided.


Exploitability normalization ensures consistent scoring semantics across heterogeneous sources & enables accurate prioritization & cross-vendor risk alignment.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **4. CWE (Common Weakness Enumeration) Normalization**


The CWE field captures the underlying software weakness category that led to the vulnerability. It exists under `problemTypes` & its subfields within the CNA container.

### **Aim**
To standardize & validate CWE representations for interoperability across CVE sources, ensuring taxonomical accuracy & supporting upstream remediation analytics.



#### **Field Map**
| Field | JSONPath / JSON Pointer | Description |
|--------|------------------------|--------------|
| **CWE ID** | $`.containers.cna.problemTypes[*].descriptions[*].cweId` | CWE identifier (e.g., `CWE-79` for XSS). |
| **Description** | $`.containers.cna.problemTypes[*].descriptions[*].description` | Human-readable summary of the weakness. |
| **Source** | $`.containers.cna.problemTypes[*].source` | Origin CNA or taxonomy authority. |
| **Type Enumeration** | $`.containers.cna.problemTypes[*].type` | Distinguishes “CWE”, “CAPEC”, or custom taxonomies. |

#### **Normalization Rules**
1. **Schema Compliance:** Valid CWE IDs match `^CWE-\d+$`; placeholders such as `"NVD-CWE-noinfo"` or `"CWE-Other"` should be allowed only within a defined grace period for later refinement.  
2. **Description Validation:** Compare `description` text to the official CWE entry (Levenshtein or cosine similarity ≥ 0.85) to catch obvious mismatches.  
3. **Multiplicity Handling:** Retain all CWE entries; designate one as **primary** by highest specificity (deepest node in the CWE hierarchy).  
4. **Taxonomy Crosswalk:** Map each CWE to its **Category**, **View** (Research/Development), & **Abstraction** level (Class/Variant/Compound).  
5. **Refinement Tracking:** Monitor unresolved “Other/NoInfo” counts to drive upstream requests for improved labeling.

#### **Derived Attributes**
- **CWE Depth:** Hierarchical distance from top-level CWE categories.  
- **Specificity Index:** `1 / CWE Depth` (higher implies more precise classification).  
- **CWE Completeness Score:** Weighted combination of specificity & presence/quality of description text.


CWE normalization enables automated weakness aggregation, supports risk clustering, & informs upstream data-quality improvement metrics.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **5. Integration of CWE & Exploitability in Analytical Pipelines**


The CWE classification & exploitability metrics jointly determine a vulnerability’s risk posture, providing context for prioritization & mitigation.

### **Aim**
To integrate semantic weakness data (CWE) with exploitability scoring (CVSS) to generate holistic, data-driven vulnerability risk profiles.


1. **CWE–CVSS Fusion Model:** Combine CWE categories (e.g., Input Validation, Access Control) with exploitability vectors (AV, PR, UI) to compute contextual multipliers.  
2. **Risk Aggregation:** Cluster CVEs by CWE family & compute average exploitability indices/ratios; treat large inter-source score variance as a quality signal.  
3. **Predictive Inference:** Train models using historical CWE–CVSS pairs to impute likely scores where metrics are incomplete.  
4. **Quality Dashboards:** Surface CWEs with persistent metric inconsistencies for upstream feedback.


CWE–Exploitability fusion provides a standardized basis for risk analytics, automated triage, & cross-vendor scoring harmonization.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **6. Statistical & Outlier Detection Framework**


Outlier detection identifies abnormality in descriptive, linguistic, semantic, & scoring data to flag inconsistent CVE records.

### **Aim**
To establish reproducible methods for identifying low-quality, incomplete, or anomalous entries.


- Compute percentile thresholds (`P50`, `P75`, `P90`) annually over the working set (e.g., `/cves/2025/**`).  
- Flag anomalies using IQR (`≥ Q3 + 1.5 × IQR`).  
- **Text anomalies:** excessive description length, high non-ASCII ratio, or HTML tags in descriptions.  
- **Scoring anomalies:** impossible/invalid CVSS tokens, cross-source baseScore deviations (Δ ≥ 1.5), or missing CWE mappings.  
- **Identity anomalies:** package-name substring traps (require ecosystem-aware purl matching & whole-token equality).


Combining text, CWE, & exploitability analytics enables unified anomaly scoring & supports upstream feedback loops to CNAs & MITRE.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **7. Upstream Data Quality Recommendations**


Guidelines for escalating systematic inconsistencies to CVE.org, NVD, or MITRE for structural correction.

### **Aim**
To reduce redundant downstream fixes by improving schema enforcement & CNA validation tooling.


1. Integrate HTML & diff-content filters in submission validators for descriptions.  
2. Enforce explicit metric provenance (e.g., `source/scorer/date`) where not already captured by the submitting CNA/ADP.  
3. Introduce an SLA for resolving `"CWE-Other"` / `"NoInfo"` to specific CWEs where feasible.  
4. Add **purl** & semantic version fields alongside CPEs to improve product identity fidelity.  
5. Encourage cross-CNA calibration with shared CVSS validation fixtures & regression tests.


Centralized remediation ensures schema coherence & consistency across the CVE data ecosystem, minimizing redundant consumer-side parsing logic.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## **8. Extraction & Validation Specification**


Formal extraction selectors & validation regex definitions for ingesting & verifying CVE JSON records.

### **Aim**
To support deterministic, reproducible ingestion & validation pipelines for large-scale CVE datasets.


**Selectors**
- Descriptions → ($`.containers.cna.descriptions[*]`) → `(lang, value)`  
- Affected → ($`.containers.cna.affected[*]`) → `(vendor, product, versions[*], platforms[*], packageName?)`  
- Metrics → ($`.containers.cna.metrics[*]`, $`.containers.adp[*].metrics[*]`) → `(system, version, vectorString, baseScore, baseSeverity, temporalScore?, temporalSeverity?, environmentalScore?, environmentalSeverity?)`  
- CWEs → ($`.containers.cna.problemTypes[*].descriptions[*].cweId|description`)  
- References → ($`.containers.cna.references[*]`) → `(url, name, tags[*])`

**Validation Regex**
- HTML tags → `<[^>]+>`  
- Git diff markers → `(?m)^(?:diff --git|index [0-9a-f]{7,}|@@[^@]+@@)`  
- CVSS v3.1 vector (coarse) → `^CVSS:3\.1/(?:[A-Z]{1,3}:[A-Z])(?:/[A-Z]{1,3}:[A-Z])+$` *(follow with full CVSS validation)*

**Package Disambiguation**
- Tokenize by `[-_.]`; lowercase; drop non-semantic suffixes (`devel`, `libs`, `common`, `doc`).  
- Restrict comparisons within identical ecosystems (`purl.type`).  
- When ecosystem is ambiguous, require whole-token equality & vendor/product alignment; reject substring matches.


This technical specification establishes standardized, automated extraction & validation for CVEProject cvelistV5 & related datasets—supporting scalable ingestion, quality scoring, & cross-source interoperability.

[*`back to index`*](https://github.com/keerthanap8898/CveToad/blob/main/CVE-user-story_Description.md#table-of-contents)

---

## 9. `License`
>     CopyrightⒸ 2025  Keerthana Purushotham <keep.consult@proton.me>.
>     Licensed under the GNU AGPL v3. See LICENSE for details.
>   [*see license*](https://github.com/keerthanap8898/CveToad/blob/main/LICENSE)
