pola-rs/polars: Python Polars 1.0.0-alpha.1

Ritchie Vink; Stijn de Gooijer; Alexander Beedie; Marco Edward Gorelli; Weijie Guo; J van Zundert; Gert Hulselmans; Orson Peters; Cory Grinstead; Marshall; nameexhaustion; chielP; Gijs Burghoorn; Matteo Santamaria; Itamar Turner-Trauring; Daniël Heres; Josh Magarick; ibENPC; Moritz Wilksch; Jorge Leitao; Karl Genockey; Mick van Gelderen; Petros Barbagiannis; Jonas Haag; Ion Koutsouris; Oliver Borchert; Marc van Heerden; Liam Brannigan; Joshua Peek

doi:10.5281/zenodo.11573041

Published June 11, 2024 | Version py-1.0.0-alpha.1

Software Open

pola-rs/polars: Python Polars 1.0.0-alpha.1

1. @pola-rs
2. Quansight
3. @alibaba
4. @aertslab
5. Polars
6. University of California, Berkeley
7. @coralogix
8. @QuantCo
9. Munin Data ApS
10. forml.eu
11. ASML
12. @Quantco

💥 Breaking changes

Consistently convert to given time zone in Series constructor (#16828)
Update reshape to return Array types instead of List types (#16825)
Default to raising for oob on all get/gather operations (#16841)
Native selector XOR set operation, guarantee consistent selector column-order (#16833)
Set infer_schema_length as keyword-only for str.json_decode (#16835)
Only accept a single column in set_sorted (#16800)
Update group-by iteration to always return tuple keys (#16793)
Default to coalesce=False in left outer join (#16769)
More removal of deprecated functionality (#16779)
Removal of read_database_uri passthrough from read_database (#16783)
Remove pyxlsb engine from read_database (#16784)
Enforce deprecation of keyword arguments as positional (#16755)
Remove deprecated parameters in Series.cut/qcut (#16741)
Expedited removal of certain deprecated functionality (#16754)
Remove deprecated functionality from rolling methods (#16750)
Update date_range to no longer produce datetime ranges (#16734)
Remove deprecated top_k parameters (#16599)
Update some error types to more appropriate variants (#15030)
More scheduled removal of deprecated functionality (#16724)
Scheduled removal of deprecated functionality (#16715)
Enforce deprecation of offset arg in truncate and round (#16655)
Change default of offset in group_by_dynamic from "negative every" to "zero" (#16658)
Constrain access to globals from df.sql in favour of top-level pl.sql (#16598)
Read 2D numpy arrays as Array[dt, shape] instead of Listst[dt] (#16710)
Activate decimal by default (#16709)
Do not propagate nulls in clip bounds (#14413)
Change .str.to_datetime to default to microsecond precision for format specifiers "%f" and "%.f" (#13597)
Remove redundant column name when pivoting by multiple values (#16439)
Preserve nulls in ewm_mean, ewm_std, and ewm_var (#15503)
Restrict casting for temporal data types (#14142)
Support decimals by default when converting from Arrow (#15324)
Remove serde functionality from pl.read_json and DataFrame.write_json (#16550)
Update function signature of nth to allow positional input of indices, remove columns parameter (#16510)
Rename struct fields of rle output to len/value and update data type of len field (#15249)
Remove default class variable values on DataTypes (#16524)
Add check_names parameter to Series.equals and default to False (#16610)
Dedicated SQLInterface and SQLSyntax errors (#16635)

⚠️ Deprecations

Deprecate LazyFrame.with_context (#16860)
Rename parameter descending to reverse in top_k methods (#16817)
Rename str.concat to str.join (#16790)
Deprecate arctan2d (#16786)

🚀 Performance improvements

Optimize string/binary sort (#16871)
Use split_at in split (#16865)
Use split_at instead of double slice in chunk splits. (#16856)
Don't rechunk in align_ if arrays are aligned (#16850)
Don't create small chunks in parallel collect. (#16845)
Add dedicated no-null branch in arg_sort (#16808)
Speed up dt.offset_by 2x for constant durations (#16728)
Toggle coalesce if non-coalesced key isn't projected (#16677)
Make dt.truncate 1.5x faster when every is just a single duration (and not an expression) (#16666)
Always prune unused columns in semi/anti join (#16665)

✨ Enhancements

Consistently convert to given time zone in Series constructor (#16828)
Improve read_csv SQL table reading function defaults (better handle dates) (#16866)
Support SQL VALUES clause and inline renaming of columns in CTE & derived table definitions (#16851)
Support Python Enum values in lit (#16858)
convert to give time zone in .str.to_datetime when values are offset-aware (#16742)
Update reshape to return Array types instead of List types (#16825)
Default to raising for oob on all get/gather operations (#16841)
Support SQL "SELECT" with no tables, optimise registration of globals (#16836)
Native selector XOR set operation, guarantee consistent selector column-order (#16833)
Extend recognised EXTRACT and DATE_PART SQL part abbreviations (#16767)
Improve error message when raising integers to negative integers, improve docs (#16827)
Return datetime for mean/median of Date colum (#16795)
Only accept a single column in set_sorted (#16800)
Expose overflowing cast (#16805)
Update group-by iteration to always return tuple keys (#16793)
Support array arithmetic for equally sized shapes (#16791)
Default to coalesce=False in left outer join (#16769)
More removal of deprecated functionality (#16779)
Removal of read_database_uri passthrough from read_database (#16783)
Remove pyxlsb engine from read_database (#16784)
Add check_order parameter to assert_series_equal (#16778)
Enforce deprecation of keyword arguments as positional (#16755)
Support cloud storage in scan_csv (#16674)
Streamline SQL INTERVAL handling and improve related error messages, update sqlparser-rs lib (#16744)
Support use of ordinal values in SQL ORDER BY clause (#16745)
Support executing polars SQL against pandas and pyarrow objects (#16746)
Remove deprecated parameters in Series.cut/qcut (#16741)
Expedited removal of certain deprecated functionality (#16754)
Remove deprecated functionality from rolling methods (#16750)
Update date_range to no longer produce datetime ranges (#16734)
Mark min_periods as keyword-only for rolling methods (#16738)
Remove deprecated top_k parameters (#16599)
Support order-by in window functions (#16743)
Add SQL support for NULLS FIRST/LAST ordering (#16711)
Update some error types to more appropriate variants (#15030)
Initial SQL support for INTERVAL strings (#16732)
More scheduled removal of deprecated functionality (#16724)
Scheduled removal of deprecated functionality (#16715)
Enforce deprecation of offset arg in truncate and round (#16655)
Change default of offset in group_by_dynamic from "negative every" to "zero" (#16658)
Constrain access to globals from df.sql in favour of top-level pl.sql (#16598)
Read 2D numpy arrays as Array[dt, shape] instead of Listst[dt] (#16710)
Activate decimal by default (#16709)
Do not propagate nulls in clip bounds (#14413)
Change .str.to_datetime to default to microsecond precision for format specifiers "%f" and "%.f" (#13597)
Remove redundant column name when pivoting by multiple values (#16439)
Preserve nulls in ewm_mean, ewm_std, and ewm_var (#15503)
Restrict casting for temporal data types (#14142)
Add many more auto-inferable datetime formats for str.to_datetime (#16634)
Support decimals by default when converting from Arrow (#15324)
Remove serde functionality from pl.read_json and DataFrame.write_json (#16550)
Update function signature of nth to allow positional input of indices, remove columns parameter (#16510)
Rename struct fields of rle output to len/value and update data type of len field (#15249)
Remove default class variable values on DataTypes (#16524)
Add check_names parameter to Series.equals and default to False (#16610)
Dedicated SQLInterface and SQLSyntax errors (#16635)
Add DIV function support to the SQL interface (#16678)
Support non-coalescing streaming left join (#16672)
Allow wildcard and exclude before struct expansions (#16671)

🐞 Bug fixes

Fix should_rechunk check (#16852)
Ensure read_excel and read_ods return identical frames across all engines when given empty spreadsheet tables (#16802)
Consistent behaviour when "infer_schema_length=0" for read_excel (#16840)
Standardised additional SQL interface errors (#16829)
Ensure that splitted ChunkedArray also flattens chunks (#16837)
Reduce needless panics in comparisons (#16831)
Reset if next caller clones inner series (#16812)
Raise on non-positive json schema inference (#16770)
Rewrite implementation of top_k/bottom_k and fix a variety of bugs (#16804)
Fix comparison of UInt64 with zero (#16799)
Fix incorrect parquet statistics written for UInt64 values > Int64::MAX (#16766)
Fix boolean distinct (#16765)
DATE_PART SQL syntax/parsing, improve some error messages (#16761)
Include pl. qualifier for inner dtypes in to_init_repr (#16235)
Column selection wasn't applied when reading CSV with no rows (#16739)
Panic on empty df / null List(Categorical) (#16730)
Only flush if operator can flush in streaming outer join (#16723)
Raise unsupported cat array (#16717)
Assert SQLInterfaceError is raised (#16713)
Restrict casting for temporal data types (#14142)
Handle nested categoricals in assert_series_equal when categorical_as_str=True (#16700)
Improve read_database check for SQLAlchemy async Session objects (#16680)
Reduce scope of multi-threaded numpy conversion (#16686)
Full null on dyn int (#16679)
Fix filter shape on empty null (#16670)

📖 Documentation

Update version switcher for 1.0.0 prereleases (#16847)
Update link from Python API reference to user guide (#16849)
Update docstring/test/etc usage of select and with_columns to idiomatic form (#16801)
Update versioning docs for 1.0.0 (#16757)
Add docstring example for DataFrame.limit (#16753)
Fix incorrect stated value of include_nulls in DataFrame.update docstring (#16701)
Update deprecation docs in the user guide (#14315)
Add example for index count in DataFrame.rolling (#16600)
Improve docstring of Expr/Series.map_elements (#16079)
Add missing polars.sql docs entry and small docstring update (#16656)

🛠️ Other improvements

Remove inner Arc from FileCacheEntry (#16870)
Do not update stable API reference on prerelease (#16846)
Update links to API references (#16843)
Prepare update of API reference URLs (#16816)
Rename allow_overflow to wrap_numerical (#16807)
Set infer_schema_length as keyword-only for str.json_decode (#16835)
Don't enter streaming engine for groupby-> agg mean/median … (#16810)
Improve safety of amortized_iter (#16820)
Remove needless inner type clone (#16718)
Fix incorrect debug assertion in ChunkedArray::from_chunks_and_dtype (#16697)
Update version resolver for 1.0.0 release (#16705)
Avoid AWS pinning to outdated crc32c version (#16681)

Thank you to all our contributors for making this release possible! @JulianCologne, @KDruzhkin, @MarcoGorelli, @Object905, @alexander-beedie, @bertiewooster, @coastalwhite, @datenzauberai, @dependabot, @dependabot[bot], @henryharbeck, @marenwestermann, @mcrumiller, @montanarograziano, @nameexhaustion, @orlp, @ritchie46, @siddharth-gulia, @stinodego, @universalmind303 and @wence-

Files

pola-rs/polars-py-1.0.0-alpha.1.zip

Files (4.7 MB)

Name	Size	Download all
pola-rs/polars-py-1.0.0-alpha.1.zip md5:1dd52090d8850047655ae9e27e05b192	4.7 MB	Preview Download

Additional details

Is supplement to: Software: https://github.com/pola-rs/polars/tree/py-1.0.0-alpha.1 (URL)

	All versions	This version
Views	7,679	49
Downloads	826	5
Data volume	3.4 GB	23.4 MB

pola-rs/polars: Python Polars 1.0.0-alpha.1

Creators

Description

💥 Breaking changes

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Files

pola-rs/polars-py-1.0.0-alpha.1.zip

Files (4.7 MB)

Additional details

Related works