Published December 16, 2025
| Version v1
Model
Open
CREXWED-distilled: CREXdata Weather Emergency Detector (Distilled)
Authors/Creators
Description
Model description
This model is the distilled version of CREXWED.
Intended Usage
This model is intended to be used for text classification in English, Spanish, Catalan, German.
How to use
from transformers import pipeline
event_predictor = pipeline("text-classification", model=model_path, batch_size=512)tokenizer_kwargs = {'padding': True, 'truncation': True, 'max_length': 512}
tweet_text_en = "It is raining heavy, the water in my apartment is up to my knees. Send help!!"tweet_text_de = "Es regnet in Strömen, das Wasser in meiner Wohnung steht mir bis zu den Knien. Schickt Hilfe!"tweet_text_es = "Está lloviendo muchísimo, hay agua en casa y me llega hasta los tobillos. Necesitamos ayuda!"tweet_text_ca = "Està plovent moltíssim, tinc aigua a casa que m'arriba fins els turmells. Necessitem ajuda!"
output = event_predictor(tweet_text_en, **tokenizer_kwargs)[0]
print(output)print(f'Predicted class: {output["label"]}')print(f'Prediction Score: {output["score"]}')
Limitations and bias
No measures have been taken to estimate the bias and toxicity embedded in the model.
Since the data used to fine-tune this model comes from social media, this will contain biases, hate speech and toxic content. We have not applied any steps to reduce their impact. The base model twitter-xlm-roberta-base this model was fine-tuned from may also contain bias and toxicity.
Training procedure
This model was trained using the same parameters and data as CREXWED.
The distillation process follows the process in DistilBERT. The number of hidden layers in a new twitter-xlm-roberta-base model by half (from 12 to 6), then initialized the student model by copying the even-numbered layers from CREXWED fine-tuned model (the teacher). Finally, we fine-tuned the student model using both the ground-truth labels from the training dataset and the teacher model’s output probabilities.
## Evaluation results
This model was evaluated using the test data as CREXWED.
| Model | de | ca | es | en |
| CREXWED | 0.838 | 0.704 | 0.705 | 0.799 |
| CREXWED-distilled | 0.850 | 0.704 | 0.692 | 0.794 |
Inference time per sample, lower is better:
| Model | Inference time (ms) |
| CREXWED | 4.122 |
| CREXWED-distilled | 2.641 |
Additional information
Authors
- Language Technologies Unit, Barcelona Supercomputing Center.
Contact
For further information, send an email to either <langtech@bsc.es> or <crexdata@bsc.es>.
License
This work is distributed under a Apache License, Version 2.0.
Terms of Use
Since, part of the data used to train this model was generated using Google's Gemma 3 model, its usage should follow Terms of Use and Prohibited Use Policy.
Funding
This work has been developed under the EU-funded CREXDATA Project (Grant Agreement No. 101092749).
Disclaimer
The model published in this repository is intended for a generalist purpose and is made available to third parties under a Apache v2.0 License.
Please keep in mind that the model may have bias and/or any other undesirable distortions. When third parties deploy or provide systems and/or services to other parties using this model (or a system based on it) or become users of the model itself, they should note that it is under their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
In no event shall the owners and creators of the model be liable for any results arising from the use made by third parties.
Files
CREXWED-distilled-release-crexdata.zip
Files
(876.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:fbc389ccef992411fe46c5ed2dddcf22
|
876.5 MB | Preview Download |