Published December 16, 2025
| Version v1
Model
Open
CREXWED: CREXdata Weather Emergency Detector
Authors/Creators
Description
Model description
CREXWED, CREXdata Weather Emergency Detector is a weather emergency text classification model fine-tuned primarily on twitter data to identify social media posts that are speaking on a wildfire or flood incident and containing actionable information to aid rescue efforts. The model was trained to provide labels 'fire', 'flood', 'none'.
Intended Usage
This model is intended to be used for text classification in English, Spanish, Catalan, German.
How to use
from transformers import pipelineevent_predictor = pipeline("text-classification", model=model_path, batch_size=512)tokenizer_kwargs = {'padding': True, 'truncation': True, 'max_length': 512}tweet_text_en = "It is raining heavy, the water in my apartment is up to my knees. Send help!!"tweet_text_de = "Es regnet in Strömen, das Wasser in meiner Wohnung steht mir bis zu den Knien. Schickt Hilfe!"tweet_text_es = "Está lloviendo muchísimo, hay agua en casa y me llega hasta los tobillos. Necesitamos ayuda!"tweet_text_ca = "Està plovent moltíssim, tinc aigua a casa que m'arriba fins els turmells. Necessitem ajuda!"output = event_predictor(tweet_text_en, **tokenizer_kwargs)[0]print(output)print(f'Predicted class: {output["label"]}')print(f'Prediction Score: {output["score"]}')
Limitations and bias
No measures have been taken to estimate the bias and toxicity embedded in the model.
Since the data used to fine-tune this model comes from social media, this will contain biases, hate speech and toxic content. We have not applied any steps to reduce their impact. The base model twitter-xlm-roberta-base this model was fine-tuned from may also contain bias and toxicity.
Training
Training data
The model was trained on a mix of real and synthetic tweets. The real tweets were collected from Twitter and synthetically annotated using a LLM, the datset can be found here. The synthetic tweets were generated using Google’s Gemma 3 27B and MistralAI’s Mistral Small 24B, the dataset can be found here.
Training procedure
The training data mentioned in the previous section was use to perform a full-parameter fine-tuning of the twitter-xlm-roberta-base model.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1.0
Evaluation
Evaluation data
The model was evaluated using the test dataset from here.
Data statistics:
- fire = fire-related
- flood = flood-related
- none = no disaster label
| Language | fire | flood | none |
| de | 222 | 304 | 7416 |
| ca | 520 | 340 | 10611 |
| es | 592 | 239 | 5988 |
| en | 230 | 942 | 7318 |
Evaluation results
| Language | F1 |
| de | 0.838 |
| ca | 0.704 |
| es | 0.705 |
| en | 0.799 |
Additional information
Authors
- Language Technologies Unit, Barcelona Supercomputing Center.
Contact
For further information, send an email to either <langtech@bsc.es> or <crexdata@bsc.es>.
License
This work is distributed under a Apache License, Version 2.0.
Terms of Use
Since, part of the data used to train this model was generated using Google's Gemma 3 model, its usage should follow Terms of Use and Prohibited Use Policy.
Funding
This work has been developed under the EU-funded CREXDATA Project (Grant Agreement No. 101092749).
Citation
Disclaimer
The model published in this repository is intended for a generalist purpose and is made available to third parties under a Apache v2.0 License.
Please keep in mind that the model may have bias and/or any other undesirable distortions. When third parties deploy or provide systems and/or services to other parties using this model (or a system based on it) or become users of the model itself, they should note that it is under their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
In no event shall the owners and creators of the model be liable for any results arising from the use made by third parties.
Files
CREXWED-release-crexdata.zip
Files
(1.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:04014c5882c1fb9994da1f70a2ee1aae
|
1.0 GB | Preview Download |