Published June 14, 2022
| Version 2.3.0
Software
Open
huggingface/datasets: 2.3.0
Authors/Creators
- Quentin Lhoest1
-
Albert Villanova del Moral1
- Patrick von Platen1
- Thomas Wolf1
- Mario Šaško1
- Yacine Jernite1
- Abhishek Thakur1
- Lewis Tunstall1
- Suraj Patil1
- Mariama Drame1
- Julien Chaumond1
- Julien Plu1
- Joe Davison1
- Simon Brandeis1
- Victor Sanh1
- Teven Le Scao1
- Kevin Canwen Xu1
- Nicolas Patry1
- Steven Liu1
- Angelina McMillan-Major1
- Philipp Schmid1
- Sylvain Gugger1
- Nathan Raw1
- Sylvain Lesage1
- Anton Lozhkov1
- Matthew Carrigan1
- Théo Matussière1
- Leandro von Werra1
- Lysandre Debut1
- Stas Bekman1
- Clément Delangue1
- 1. Hugging Face
Description
Datasets Changes
- New: ImageNet-Sketch by @nateraw in https://github.com/huggingface/datasets/pull/4301
- New: Biwi Kinect Head Pose by @dnaveenr in https://github.com/huggingface/datasets/pull/3903
- New: enwik8 by @HallerPatrick in https://github.com/huggingface/datasets/pull/4321
- New: LCCC dataset by @silverriver in https://github.com/huggingface/datasets/pull/4416
- New: TruthfulQA by @jon-tow in https://github.com/huggingface/datasets/pull/4159
- New: BIG-bench by @andersjohanandreassen in https://github.com/huggingface/datasets/pull/4125
- New: QuickDraw by @mariosasko in https://github.com/huggingface/datasets/pull/3592
- New: SST-2 by @albertvillanova in https://github.com/huggingface/datasets/pull/4473
- Update: imagenet-1k - remove manual download by @mariosasko in https://github.com/huggingface/datasets/pull/4299
- ImageNet can now be loaded in python with
load_datasetwithout requiring a manual download ! - It also supports streaming mode with
load_dataset("imagenet-1k", streaming=True)
- ImageNet can now be loaded in python with
- Update: spider - Remove Google Drive URL by @albertvillanova in https://github.com/huggingface/datasets/pull/4410
- Update: blended_skill_talk - add missing columns to by @mariosasko in https://github.com/huggingface/datasets/pull/4437
- Update: multi-news - Use newer version with fixes by @JohnGiorgi in https://github.com/huggingface/datasets/pull/4451
- Update: fever - update data URLs by @albertvillanova in https://github.com/huggingface/datasets/pull/44554459
- Update: udhr - Add and fix language tags by @albertvillanova in https://github.com/huggingface/datasets/pull/
- Update: udhr - update metadata by @leondz in https://github.com/huggingface/datasets/pull/4362
- Update: wider_face - Replace data URLs once hosted on the Hub by @albertvillanova in https://github.com/huggingface/datasets/pull/4469
- Update: PASS - update dataset version by @mariosasko in https://github.com/huggingface/datasets/pull/4488
- Fix: GEM - fix bug in wiki_auto_asset_turk config by @albertvillanova in https://github.com/huggingface/datasets/pull/4389
- Fix: GEM - fix URL for totto config by @albertvillanova in https://github.com/huggingface/datasets/pull/4396
- Fix: timit_asr - fix DuplicatedKeysError by @albertvillanova in https://github.com/huggingface/datasets/pull/4424
- Fix: timit_asr - Make extensions case-insensitive by @albertvillanova in https://github.com/huggingface/datasets/pull/4425
- Fix: timit_asr - Fix directory names for LDC data by @albertvillanova in https://github.com/huggingface/datasets/pull/4436
- Fix: iwslt2017 by @lhoestq in https://github.com/huggingface/datasets/pull/4481
- to_tf_dataset rewrite by @Rocketknight1 in https://github.com/huggingface/datasets/pull/4170
- see more in the documentation
- Support DataLoader with num_workers > 0 in streaming mode by @lhoestq in https://github.com/huggingface/datasets/pull/4375
- see more in the documentation
- Added stratify option to
train_test_splitby @nandwalritik in https://github.com/huggingface/datasets/pull/4322 - Re-add support for Apache Beam functionality by @albertvillanova in https://github.com/huggingface/datasets/pull/4328
- Resume
push_to_hub: skip identical files inpush_to_hubinstead of overwriting by @mariosasko in https://github.com/huggingface/datasets/pull/4402 - Support nested/complex feature types as
featuresin packaged loaders by @mariosasko in https://github.com/huggingface/datasets/pull/4364
- Minor fixes/improvements in
scene_parse_150card by @mariosasko in https://github.com/huggingface/datasets/pull/4447 - Tidy up license metadata for google_wellformed_query, newspop, sick by @leondz in https://github.com/huggingface/datasets/pull/4378
- Fix example in opus_ubuntu, Add license info by @leondz in https://github.com/huggingface/datasets/pull/4360
- Update README.md of fquad by @lhoestq in https://github.com/huggingface/datasets/pull/4450
- Add API code examples for loading methods by @stevhliu in https://github.com/huggingface/datasets/pull/4300
- Add API code examples for remaining main classes by @stevhliu in https://github.com/huggingface/datasets/pull/4292
- Generalize tutorials for audio and vision by @stevhliu in https://github.com/huggingface/datasets/pull/4468
- Update CI deprecated legacy image by @albertvillanova in https://github.com/huggingface/datasets/pull/4393
- remove int documentation from logging docs by @lvwerra in https://github.com/huggingface/datasets/pull/4392
- Fix docstring in DatasetDict::shuffle by @felixdivo in https://github.com/huggingface/datasets/pull/4344
- Fix Version equality by @albertvillanova in https://github.com/huggingface/datasets/pull/4359
- Set builder name from module instead of class by @albertvillanova in https://github.com/huggingface/datasets/pull/4388
- Test dill by @albertvillanova in https://github.com/huggingface/datasets/pull/4385
- Refactor download by @albertvillanova in https://github.com/huggingface/datasets/pull/4384
- Fix dependency on dill version by @albertvillanova in https://github.com/huggingface/datasets/pull/4397
- Support remote cache_dir by @albertvillanova in https://github.com/huggingface/datasets/pull/4347
- Update imagenet gate by @lhoestq in https://github.com/huggingface/datasets/pull/4408
- Fix dataset builder default version by @albertvillanova in https://github.com/huggingface/datasets/pull/4356
- Uncomment logging deactivation for ArrowBasedBuilder by @thomasw21 in https://github.com/huggingface/datasets/pull/4403
- Rename DatasetBuilder config_name by @albertvillanova in https://github.com/huggingface/datasets/pull/4414
- Fix metadata validation by @albertvillanova in https://github.com/huggingface/datasets/pull/4390
- Add HF.co for PRs/Issues for specific datasets by @lhoestq in https://github.com/huggingface/datasets/pull/4427
- Fix type hint and documentation for
new_fingerprintby @fxmarty in https://github.com/huggingface/datasets/pull/4326 - Skip hidden files/directories in data files resolution and
iter_filesby @mariosasko in https://github.com/huggingface/datasets/pull/4412 - Fix docstring of inspect_dataset by @albertvillanova in https://github.com/huggingface/datasets/pull/4438
- Fix builder docstring by @albertvillanova in https://github.com/huggingface/datasets/pull/4432
- Fix kwargs in docstrings by @albertvillanova in https://github.com/huggingface/datasets/pull/4444
- Fix missing args in docstring of load_dataset_builder by @albertvillanova in https://github.com/huggingface/datasets/pull/4445
- Add missing kwargs to docstrings by @albertvillanova in https://github.com/huggingface/datasets/pull/4446
- Add extractor for bzip2-compressed files by @asivokon in https://github.com/huggingface/datasets/pull/4421
- Fix dummy dataset generation script for handling nested types of _URLs by @silverriver in https://github.com/huggingface/datasets/pull/4434
- Update
dataset_infos.jsonwith new split info inDataset.push_to_hubto avoid verification error by @mariosasko in https://github.com/huggingface/datasets/pull/4415 - Update builder docstring for deprecated/added arguments by @albertvillanova in https://github.com/huggingface/datasets/pull/4429
- Extend support for streaming datasets that use xml.dom.minidom.parse by @albertvillanova in https://github.com/huggingface/datasets/pull/4464
- Fix script fetching and local path handling in
inspect_datasetandinspect_metricby @mariosasko in https://github.com/huggingface/datasets/pull/4433 - Fix bigbench config names by @lhoestq in https://github.com/huggingface/datasets/pull/4465
- Fix 401 error for unauthticated requests to non-existing repos by @lhoestq in https://github.com/huggingface/datasets/pull/4472
- Reorder returned validation/test splits in script template by @albertvillanova in https://github.com/huggingface/datasets/pull/4470
- Better ImportError message when a dataset script dependency is missing by @lhoestq in https://github.com/huggingface/datasets/pull/4484
- Fix cast to null by @lhoestq in https://github.com/huggingface/datasets/pull/4485
- [Docs] How to use with PyTorch page by @lhoestq in https://github.com/huggingface/datasets/pull/4474
- Optimize contiguous shard and select by @lhoestq in https://github.com/huggingface/datasets/pull/4466
- First draft of the docs for TF + Datasets by @Rocketknight1 in https://github.com/huggingface/datasets/pull/4457
- Update
_format_columnsinremove_columnsby @alvarobartt in https://github.com/huggingface/datasets/pull/4411 - Fix wrong map parameter name in cache docs by @h4iku in https://github.com/huggingface/datasets/pull/4293
- Pin the revision in imagenet download links by @lhoestq in https://github.com/huggingface/datasets/pull/4492
- Refactor column mappings for question answering datasets by @lewtun in https://github.com/huggingface/datasets/pull/4391
- @leondz made their first contribution in https://github.com/huggingface/datasets/pull/4378
- @felixdivo made their first contribution in https://github.com/huggingface/datasets/pull/4344
- @nandwalritik made their first contribution in https://github.com/huggingface/datasets/pull/4322
- @fxmarty made their first contribution in https://github.com/huggingface/datasets/pull/4326
- @HallerPatrick made their first contribution in https://github.com/huggingface/datasets/pull/4321
- @silverriver made their first contribution in https://github.com/huggingface/datasets/pull/4416
- @asivokon made their first contribution in https://github.com/huggingface/datasets/pull/4421
- @andersjohanandreassen made their first contribution in https://github.com/huggingface/datasets/pull/4125
Full Changelog: https://github.com/huggingface/datasets/compare/2.2.2...lol
Files
huggingface/datasets-2.3.0.zip
Files
(54.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:75be3d65dcf595672129ced7ab45e339
|
54.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/huggingface/datasets/tree/2.3.0 (URL)