Published December 16, 2025 | Version v1
Model Open

CREXWED: CREXdata Weather Emergency Detector

  • 1. ROR icon Barcelona Supercomputing Center

Description

Model description

 

CREXWED, CREXdata Weather Emergency Detector is a weather emergency text classification model fine-tuned primarily on twitter data to identify social media posts that are speaking on a wildfire or flood incident and containing actionable information to aid rescue efforts. The model was trained to provide labels 'fire', 'flood', 'none'.

 

Intended Usage

 

This model is intended to be used for text classification in English, Spanish, Catalan, German.

 

How to use

 

from transformers import pipeline
 
event_predictor = pipeline("text-classification", model=model_path, batch_size=512)
tokenizer_kwargs = {'padding': True, 'truncation': True, 'max_length': 512}
 
tweet_text_en = "It is raining heavy, the water in my apartment is up to my knees. Send help!!"
tweet_text_de = "Es regnet in Strömen, das Wasser in meiner Wohnung steht mir bis zu den Knien. Schickt Hilfe!"
tweet_text_es = "Está lloviendo muchísimo, hay agua en casa y me llega hasta los tobillos. Necesitamos ayuda!"
tweet_text_ca = "Està plovent moltíssim, tinc aigua a casa que m'arriba fins els turmells. Necessitem ajuda!"
 
output = event_predictor(tweet_text_en, **tokenizer_kwargs)[0]
 
print(output)
print(f'Predicted class: {output["label"]}')
print(f'Prediction Score: {output["score"]}')
 

 

Limitations and bias

 

No measures have been taken to estimate the bias and toxicity embedded in the model.

 

Since the data used to fine-tune this model comes from social media, this will contain biases, hate speech and toxic content. We have not applied any steps to reduce their impact. The base model twitter-xlm-roberta-base this model was fine-tuned from may also contain bias and toxicity.

 

Training

 

Training data

 

The model was trained on a mix of real and synthetic tweets. The real tweets were collected from Twitter and synthetically annotated using a LLM, the datset can be found here. The synthetic tweets were generated using Google’s Gemma 3 27B and MistralAI’s Mistral Small 24B, the dataset can be found here.



Training procedure

 

The training data mentioned in the previous section was use to perform a full-parameter fine-tuning of the twitter-xlm-roberta-base model.

 

Training hyperparameters

 

The following hyperparameters were used during training:

 

- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1.0



Evaluation

 

Evaluation data

 

The model was evaluated using the test dataset from here.

 

Data statistics:
 
- fire = fire-related  
- flood = flood-related  
- none = no disaster label
 
Language  fire flood none
de 222 304   7416
ca 520 340  10611
es 592  239 5988
en 230  942 7318

 

Evaluation results
Language F1
de   0.838
ca     0.704
es    0.705
en   0.799
 

 

Additional information

 

Authors

 

- Language Technologies Unit, Barcelona Supercomputing Center.

 

Contact

 

For further information, send an email to either <langtech@bsc.es> or <crexdata@bsc.es>.

 

License

 

This work is distributed under a Apache License, Version 2.0.

 

Terms of Use

 

Since, part of the data used to train this model was generated using Google's Gemma 3 model, its usage should follow Terms of Use and Prohibited Use Policy.

 

Funding

 

This work has been developed under the EU-funded CREXDATA Project (Grant Agreement No. 101092749).

 

Citation



Disclaimer

 

The model published in this repository is intended for a generalist purpose and is made available to third parties under a Apache v2.0 License.

 

Please keep in mind that the model may have bias and/or any other undesirable distortions. When third parties deploy or provide systems and/or services to other parties using this model (or a system based on it) or become users of the model itself, they should note that it is under their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.

 

In no event shall the owners and creators of the model be liable for any results arising from the use made by third parties.

Files

CREXWED-release-crexdata.zip

Files (1.0 GB)

Name Size Download all
md5:04014c5882c1fb9994da1f70a2ee1aae
1.0 GB Preview Download

Additional details

Funding

European Commission
CREXDATA - Critical Action Planning over Extreme-Scale Data 101092749