Published June 11, 2024
| Version py-1.0.0-alpha.1
Software
Open
pola-rs/polars: Python Polars 1.0.0-alpha.1
Creators
- Ritchie Vink
- Stijn de Gooijer1
- Alexander Beedie
- Marco Edward Gorelli2
- Weijie Guo3
- J van Zundert
- Gert Hulselmans4
- Orson Peters5
- Cory Grinstead
- Marshall
- nameexhaustion
- chielP
- Gijs Burghoorn5
- Matteo Santamaria6
- Itamar Turner-Trauring
- DaniΓ«l Heres7
- Josh Magarick
- ibENPC
- Moritz Wilksch8
- Jorge Leitao9
- Karl Genockey
- Mick van Gelderen
- Petros Barbagiannis
- Jonas Haag10
- Ion Koutsouris11
- Oliver Borchert12
- Marc van Heerden
- Liam Brannigan
- Joshua Peek
- 1. @pola-rs
- 2. Quansight
- 3. @alibaba
- 4. @aertslab
- 5. Polars
- 6. University of California, Berkeley
- 7. @coralogix
- 8. @QuantCo
- 9. Munin Data ApS
- 10. forml.eu
- 11. ASML
- 12. @Quantco
Description
π₯ Breaking changes
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshape
to return Array types instead of List types (#16825) - Default to raising for oob on all
get
/gather
operations (#16841) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_length
as keyword-only forstr.json_decode
(#16835) - Only accept a single column in
set_sorted
(#16800) - Update group-by iteration to always return tuple keys (#16793)
- Default to
coalesce=False
in left outer join (#16769) - More removal of deprecated functionality (#16779)
- Removal of
read_database_uri
passthrough fromread_database
(#16783) - Remove
pyxlsb
engine fromread_database
(#16784) - Enforce deprecation of keyword arguments as positional (#16755)
- Remove deprecated parameters in
Series.cut/qcut
(#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_range
to no longer produce datetime ranges (#16734) - Remove deprecated
top_k
parameters (#16599) - Update some error types to more appropriate variants (#15030)
- More scheduled removal of deprecated functionality (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg intruncate
andround
(#16655) - Change default of
offset
in group_by_dynamic from "negativeevery
" to "zero" (#16658) - Constrain access to globals from
df.sql
in favour of top-levelpl.sql
(#16598) - Read 2D numpy arrays as Array[dt, shape] instead of Listst[dt] (#16710)
- Activate decimal by default (#16709)
- Do not propagate nulls in
clip
bounds (#14413) - Change
.str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Remove redundant column name when pivoting by multiple values (#16439)
- Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Support decimals by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove default class variable values on DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610) - Dedicated
SQLInterface
andSQLSyntax
errors (#16635)
β οΈ Deprecations
- Deprecate
LazyFrame.with_context
(#16860) - Rename parameter
descending
toreverse
intop_k
methods (#16817) - Rename
str.concat
tostr.join
(#16790) - Deprecate
arctan2d
(#16786)
π Performance improvements
- Optimize string/binary sort (#16871)
- Use
split_at
insplit
(#16865) - Use
split_at
instead of double slice in chunk splits. (#16856) - Don't rechunk in
align_
if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort
(#16808) - Speed up
dt.offset_by
2x for constant durations (#16728) - Toggle coalesce if non-coalesced key isn't projected (#16677)
- Make
dt.truncate
1.5x faster whenevery
is just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
β¨ Enhancements
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csv
SQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUES
clause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enum
values inlit
(#16858) - convert to give time zone in
.str.to_datetime
when values are offset-aware (#16742) - Update
reshape
to return Array types instead of List types (#16825) - Default to raising for oob on all
get
/gather
operations (#16841) - Support
SQL
"SELECT" with no tables, optimise registration of globals (#16836) - Native
selector
XOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACT
andDATE_PART
SQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Only accept a single column in
set_sorted
(#16800) - Expose overflowing cast (#16805)
- Update group-by iteration to always return tuple keys (#16793)
- Support array arithmetic for equally sized shapes (#16791)
- Default to
coalesce=False
in left outer join (#16769) - More removal of deprecated functionality (#16779)
- Removal of
read_database_uri
passthrough fromread_database
(#16783) - Remove
pyxlsb
engine fromread_database
(#16784) - Add
check_order
parameter toassert_series_equal
(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv
(#16674) - Streamline SQL
INTERVAL
handling and improve related error messages, updatesqlparser-rs
lib (#16744) - Support use of ordinal values in SQL
ORDER BY
clause (#16745) - Support executing polars SQL against
pandas
andpyarrow
objects (#16746) - Remove deprecated parameters in
Series.cut/qcut
(#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_range
to no longer produce datetime ranges (#16734) - Mark
min_periods
as keyword-only forrolling
methods (#16738) - Remove deprecated
top_k
parameters (#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LAST
ordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVAL
strings (#16732) - More scheduled removal of deprecated functionality (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offset
arg intruncate
andround
(#16655) - Change default of
offset
in group_by_dynamic from "negativeevery
" to "zero" (#16658) - Constrain access to globals from
df.sql
in favour of top-levelpl.sql
(#16598) - Read 2D numpy arrays as Array[dt, shape] instead of Listst[dt] (#16710)
- Activate decimal by default (#16709)
- Do not propagate nulls in
clip
bounds (#14413) - Change
.str.to_datetime
to default to microsecond precision for format specifiers"%f"
and"%.f"
(#13597) - Remove redundant column name when pivoting by multiple values (#16439)
- Preserve nulls in
ewm_mean
,ewm_std
, andewm_var
(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime
(#16634) - Support decimals by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_json
andDataFrame.write_json
(#16550) - Update function signature of
nth
to allow positional input of indices, removecolumns
parameter (#16510) - Rename struct fields of
rle
output tolen
/value
and update data type oflen
field (#15249) - Remove default class variable values on DataTypes (#16524)
- Add
check_names
parameter toSeries.equals
and default toFalse
(#16610) - Dedicated
SQLInterface
andSQLSyntax
errors (#16635) - Add
DIV
function support to the SQL interface (#16678) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
π Bug fixes
- Fix
should_rechunk
check (#16852) - Ensure
read_excel
andread_ods
return identical frames across all engines when given empty spreadsheet tables (#16802) - Consistent behaviour when "infer_schema_length=0" for
read_excel
(#16840) - Standardised additional SQL interface errors (#16829)
- Ensure that splitted ChunkedArray also flattens chunks (#16837)
- Reduce needless panics in comparisons (#16831)
- Reset if next caller clones inner series (#16812)
- Raise on non-positive json schema inference (#16770)
- Rewrite implementation of
top_k/bottom_k
and fix a variety of bugs (#16804) - Fix comparison of UInt64 with zero (#16799)
- Fix incorrect parquet statistics written for UInt64 values > Int64::MAX (#16766)
- Fix boolean distinct (#16765)
DATE_PART
SQL syntax/parsing, improve some error messages (#16761)- Include
pl.
qualifier for inner dtypes into_init_repr
(#16235) - Column selection wasn't applied when reading CSV with no rows (#16739)
- Panic on empty df / null List(Categorical) (#16730)
- Only flush if operator can flush in streaming outer join (#16723)
- Raise unsupported cat array (#16717)
- Assert SQLInterfaceError is raised (#16713)
- Restrict casting for temporal data types (#14142)
- Handle nested categoricals in
assert_series_equal
whencategorical_as_str=True
(#16700) - Improve
read_database
check for SQLAlchemy async Session objects (#16680) - Reduce scope of multi-threaded numpy conversion (#16686)
- Full null on dyn int (#16679)
- Fix filter shape on empty null (#16670)
π Documentation
- Update version switcher for 1.0.0 prereleases (#16847)
- Update link from Python API reference to user guide (#16849)
- Update docstring/test/etc usage of
select
andwith_columns
to idiomatic form (#16801) - Update versioning docs for 1.0.0 (#16757)
- Add docstring example for
DataFrame.limit
(#16753) - Fix incorrect stated value of
include_nulls
inDataFrame.update
docstring (#16701) - Update deprecation docs in the user guide (#14315)
- Add example for index count in
DataFrame.rolling
(#16600) - Improve docstring of
Expr/Series.map_elements
(#16079) - Add missing
polars.sql
docs entry and small docstring update (#16656)
π οΈ Other improvements
- Remove inner
Arc
fromFileCacheEntry
(#16870) - Do not update stable API reference on prerelease (#16846)
- Update links to API references (#16843)
- Prepare update of API reference URLs (#16816)
- Rename allow_overflow to wrap_numerical (#16807)
- Set
infer_schema_length
as keyword-only forstr.json_decode
(#16835) - Don't enter streaming engine for groupby-> agg mean/median β¦ (#16810)
- Improve safety of amortized_iter (#16820)
- Remove needless inner type clone (#16718)
- Fix incorrect debug assertion in
ChunkedArray::from_chunks_and_dtype
(#16697) - Update version resolver for
1.0.0
release (#16705) - Avoid AWS pinning to outdated crc32c version (#16681)
Thank you to all our contributors for making this release possible! @JulianCologne, @KDruzhkin, @MarcoGorelli, @Object905, @alexander-beedie, @bertiewooster, @coastalwhite, @datenzauberai, @dependabot, @dependabot[bot], @henryharbeck, @marenwestermann, @mcrumiller, @montanarograziano, @nameexhaustion, @orlp, @ritchie46, @siddharth-gulia, @stinodego, @universalmind303 and @wence-
Files
pola-rs/polars-py-1.0.0-alpha.1.zip
Files
(4.7 MB)
Name | Size | Download all |
---|---|---|
md5:1dd52090d8850047655ae9e27e05b192
|
4.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/pola-rs/polars/tree/py-1.0.0-alpha.1 (URL)