Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.1.18
Changes
✨ New Features
- [FEAT] Add support for windows in daft @samster25 (#1386)
- [FEAT] Add debug logging to s3 native apis @samster25 (#1414)
- [FEAT] enable path style for s3 custom endpoints by default @samster25 (#1410)
- [FEAT] Native S3 Lister, support trailing slashes and fix panics when connection is dropped for tokio @samster25 (#1404)
- [FEAT] Native Rust listing of GCS @jaychia (#1392)
- [FEAT] [New Query Planner] Enable new query planner by default. @clarkzinzow (#1398)
- [FEAT] Parameter to set num_parallel_tasks for bulk readers @samster25 (#1399)
- [FEAT] Native S3 Client: allow disabling ssl verification or checking hostnames @samster25 (#1395)
- [FEAT] Improved projection folding. @xcharleslin (#1374)
- [FEAT] bulk parquet pyarrow reader @samster25 (#1396)
- [FEAT] Native Recursive File Lister @samster25 (#1353)
- [FEAT] Implement .dt.year/month/day for timestamp types @jaychia (#1385)
- [FEAT] [New Query Planner] Add support for fsspec filesystems to new query planner. @clarkzinzow (#1357)
- [FEAT] Common subexpression elimination in Projection construction @xcharleslin (#1347)
👾 Bug Fixes
- [BUG] Fix num input partitions in coalesce. @clarkzinzow (#1442)
- [BUG] Fix scheme bug in GCS anonymous mode @jaychia (#1443)
- [BUG] Fix runner check at plan execution time for new query planner @clarkzinzow (#1435)
- [BUG] [Docs] Allow source code discovery to fail silently for pyo3-defined classes when generating docs. @clarkzinzow (#1430)
- [BUG] patch workspace version when building wheels @samster25 (#1418)
- [BUG] Anaconda client don't upload src wheels @samster25 (#1415)
- [BUG] Anaconda client needs wildcard for upload @samster25 (#1413)
- [BUG] Fix gs listing to include 0 sized marker files @jaychia (#1412)
- [BUG] force upload of anaconda nightly wheels @samster25 (#1411)
- [BUG] add test cases for bulk minio reading @samster25 (#1402)
- [BUG] Fixes to S3 Native Lister with correct Error propagation @samster25 (#1401)
- [BUG] Fix public API decorator type annotations. @clarkzinzow (#1397)
- [BUG] Fix partition spec bugs from old query planner @xcharleslin (#1372)
📖 Documentation
- [BUG] [Docs] Allow source code discovery to fail silently for pyo3-defined classes when generating docs. @clarkzinzow (#1430)
- [FEAT] Implement .dt.year/month/day for timestamp types @jaychia (#1385)
🧰 Maintenance
- [CHORE] disable windows pytest after building @samster25 (#1420)
- [CHORE] add caching for pip wheels @samster25 (#1419)
- [CHORE] macos xl runners are 0.32/minute not hour... @samster25 (#1417)
- [CHORE] Centralize pyo3 pickling around
__reduce__
+ bincode macro. @clarkzinzow (#1394) - [CHORE] larger macos runner for builds @samster25 (#1403)
- [CHORE] Add stubs and improve comments for pyo3-exposed abstractions, + driveby type/bug fixes. @clarkzinzow (#1377)
- [CHORE] add retries for broken link checker @samster25 (#1378)
- [CHORE] pin azure-storage-blob due to breaking new version @samster25 (#1373)
- [CHORE] [New Query Planner] Misc. user-facing error tweaks to improve UX. @clarkzinzow (#1358)
v0.1.17
Changes
✨ New Features
- [FEAT] Native Parquet Reader into pyarrow directly @samster25 (#1366)
- [FEAT] Add configurable io thread pool size @samster25 (#1363)
- [FEAT] Add flag to limit number of connections to S3 @samster25 (#1360)
- [FEAT] export jemalloc arm64 flag inside container @samster25 (#1362)
🚀 Performance Improvements
- [PERF] Used owned Stream in Parquet Page Iterator @samster25 (#1365)
- [PERF] enable jemalloc with background threads @samster25 (#1361)
- [PERF] Add microbenchmarks for takes @jaychia (#1350)
- [PERF] Optimize filter on nested growables @jaychia (#1349)
👾 Bug Fixes
- [BUG] Respect
multithreaded_io
flag when reading parquet @samster25 (#1359) - [BUG] Schema Display should use dtype Display instead of Debug @jaychia (#1355)
- [BUG] propagate parquet io error instead of panicking @samster25 (#1352)
🧰 Maintenance
- [CHORE] [New Query Planner] Add simple
df.explain()
option; change to fixed-point policy for rule batch @clarkzinzow (#1354) - [CHORE] Add status code to IO integration tests @jaychia (#1356)
- [CHORE] Fix List/FixedSizeList DataType to hold a dtype instead of Field @jaychia (#1351)
- [CHORE] Add Series::full_null/empty/from_arrow to reduce code duplication @jaychia (#1331)
- [CHORE] Add a Growable factory method @jaychia (#1330)
- [CHORE] Add new ListArray @jaychia (#1329)
⬆️ Dependencies
5 changes
- Bump tokio from 1.29.1 to 1.32.0 @dependabot (#1371)
- Bump tempfile from 3.7.1 to 3.8.0 @dependabot (#1285)
- Bump pyo3 from 0.19.1 to 0.19.2 @dependabot (#1312)
- Bump pytest from 7.4.0 to 7.4.1 @dependabot (#1339)
- Bump actions/checkout from 3 to 4 @dependabot (#1337)
v0.1.16
Changes
✨ New Features
- [FEAT] __repr__ for ResourceRequest @xcharleslin (#1343)
- [FEAT] [New Query Planner] Refactor file globbing logic by exposing
FileInfos
to Python @clarkzinzow (#1307) - [FEAT] S3 Native List Impl for a directory @samster25 (#1324)
- [FEAT] [New Query Planner] Add support for
DropRepartition
@clarkzinzow (#1302) - [FEAT] Add all projection optimization rules to new query planner. @xcharleslin (#1288)
- [FEAT] [New Query Planner] Add support for
PushDownLimit
@clarkzinzow (#1300)
👾 Bug Fixes
- [BUG] Fix Table.read_parquet behavior when it encounters arrow_schema @jaychia (#1336)
- [BUG] [New Query Planner] Revert file info partition column names. @clarkzinzow (#1333)
- [BUG] Fix fixed size list array FullNull implementation @jaychia (#1320)
🧰 Maintenance
- [CHORE] install perl before maturin @samster25 (#1345)
- [CHORE] Switch to openssl @samster25 (#1344)
- [CHORE] [New Query Planner] pyo3-agnostic
LogicalPlanBuilder
, op constructor arg orderings @clarkzinzow (#1332) - [CHORE] factor io config into common code @samster25 (#1335)
- [CHORE] [New Query Planner] Remove
ExpressionsProjection
from builder, move validation intoOp::try_new()
@clarkzinzow (#1327) - [CHORE] StructArray refactors @jaychia (#1326)
- [CHORE] drop flag for non native compile for daft profiling @samster25 (#1323)
- [CHORE] pin pyarrow to 12 for ray compat tests @samster25 (#1322)
- [CHORE] Move FixedSizeListArray to array/fixed_size_list_array.rs @jaychia (#1319)
- [CHORE] Add fix for list schema inference tests using PyArrow 13.0.0 @jaychia (#1318)
- [CHORE] Implementations of FixedSizeListArray @jaychia (#1281)
⬆️ Dependencies
- Bump ray[data,default] from 2.6.0 to 2.6.3 @dependabot (#1315)
- Bump orjson from 3.9.4 to 3.9.5 @dependabot (#1316)
- Bump aws-actions/configure-aws-credentials from 2 to 3 @dependabot (#1317)
v0.1.15
Changes
✨ New Features
- [FEAT] add row group support to daft parquet reader @samster25 (#1308)
- [FEAT] [New Query Planner] Add logical plan hashing, rule batches, fixed-point policies, early optimizer termination, and optimization cycle detection. @clarkzinzow (#1292)
👾 Bug Fixes
- [BUG] make seperate file for new query planner @samster25 (#1309)
🧰 Maintenance
- [CHORE] Refactor Growable traits and downcast for lifetimes @jaychia (#1305)
- [CHORE] Refactor broadcast to use growables @jaychia (#1304)
- [CHORE] Code reduction in growable macros + logical if/else refactor @jaychia (#1301)
- [CHORE] Refactor growables to return a Series instead of concrete arrays @jaychia (#1297)
- [CHORE] Minor cleanup for
logical_plan::Project
@xcharleslin (#1299)
v0.1.14
Changes
✨ New Features
- [FEAT] add flag to use multithreaded io for parquet_read_table @samster25 (#1298)
- [FEAT] Add Retry Mode, connection timeout, and read timeout to S3Config @samster25 (#1293)
- [FEAT] [New Query Planner] Add optimization framework and
PushDownFilter
rule. @clarkzinzow (#1284)
👾 Bug Fixes
- [BUG] Fix semantic merge conflict @xcharleslin (#1286)
🧰 Maintenance
- [CHORE] Move schema construction under LogicalPlan construction @xcharleslin (#1290)
- [CHORE] Implement growables for array types @jaychia (#1287)
- [CHORE] Unify indexmap versions and bump to 2.0.0 @xcharleslin (#1291)
- [CHORE] Refactor Series downcast and LogicalArrayImpl @jaychia (#1289)
- [CHORE] Pass in file size and num rows to Rust query planner @xcharleslin (#1282)
v0.1.13
Changes
✨ New Features
- [FEAT] Add Flag to_arrow to convert large string arrays @samster25 (#1283)
👾 Bug Fixes
- [BUG] try release profile rather than dev-bench for daft profiling @samster25 (#1280)
🧰 Maintenance
- [CHORE] reduce severity of region reroute logs to debug @samster25 (#1279)
v0.1.12
Changes
✨ New Features
- [FEAT] [New Query Planner] All functional tests pass + add to CI. @clarkzinzow (#1274)
- [FEAT] [New Query Planner] Add support for `df.count_rows(). @clarkzinzow (#1273)
- [FEAT] native google cloud reader @samster25 (#1271)
- [FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops @clarkzinzow (#1272)
- [FEAT] [New Query Planner] Support for Ray runner in new query planner. @clarkzinzow (#1265)
- [FEAT] Add Schema.from_pyarrow @jaychia (#1262)
- [FEAT] [New Query Planner] Add support for joins. @clarkzinzow (#1260)
- [FEAT] [New Query Planner] Add support for Explode. @clarkzinzow (#1258)
👾 Bug Fixes
- [BUG] Use manylinux_2_24 for aarch64 linux to be able to publish manylinux2014 @samster25 (#1275)
📖 Documentation
- [FEAT] [New Query Planner] Support for Ray runner in new query planner. @clarkzinzow (#1265)
🧰 Maintenance
- [CHORE] Refactor arrays to share a FromArrow constructor trait @jaychia (#1276)
- [CHORE] Bump rust nightly channel date @jaychia (#1255)
⬆️ Dependencies
4 changes
- Bump opencv-python from 4.8.0.74 to 4.8.0.76 @dependabot (#1267)
- Bump orjson from 3.9.2 to 3.9.4 @dependabot (#1268)
- Bump image from 0.24.6 to 0.24.7 @dependabot (#1269)
- Bump isbang/compose-action from 1.5.0 to 1.5.1 @dependabot (#1270)
v0.1.11
Changes
✨ New Features
- [FEAT] [New Query Plan] Add support for Projection and Coalesce, enable many tests @clarkzinzow (#1256)
- [FEAT] [New Query Planner] Add support for Concat. @clarkzinzow (#1254)
- [FEAT] [New Query Planner] Add support for tabular writes. @clarkzinzow (#1252)
- [FEAT] Multi-partition aggregate; Coalesce @xcharleslin (#1249)
- [FEAT] [New Query Planner] Add support for Sort, Repartition, and Distinct in new query planner. @clarkzinzow (#1248)
- [FEAT] Add Azure Support for Native Downloader @samster25 (#1250)
- [FEAT] Locally unique semantic IDs for Expressions @xcharleslin (#1243)
- [FEAT] Read parquet tables with int96 coercion option @jaychia (#1231)
- [FEAT] [New Query Plan] Add support for CSV scans, JSON scans, in-memory scans and caching materialized results. @clarkzinzow (#1246)
- [FEAT] Native Downloader add Retry Config parameters @samster25 (#1244)
- [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
- [FEAT] [New Query Planner] Logical --> physical translation, physical plan execution. @clarkzinzow (#1232)
- [FEAT] native parquet correctness checks @samster25 (#1225)
- [FEAT] add session token as input to io config @samster25 (#1224)
🚀 Performance Improvements
- [PERF] Native Parquet Bulk Reader @samster25 (#1233)
👾 Bug Fixes
- [BUG] drop native-tls (openssl) for azure which was a default feature @samster25 (#1251)
- [BUG] Fix decimal byte arrays @jaychia (#1247)
- [BUG] correct type when printing incorrect row count @samster25 (#1226)
- [BUG] try manylinux 2 28 @samster25 (#1214)
- [BUG] downgrade ray to 2.6 @samster25 (#1212)
- [BUG] add explict target for aarch64 linux @samster25 (#1209)
- [BUG] Fix incorrect sign bug for small decimals @xcharleslin (#1204)
- [BUG] Set SSL paths on linux @samster25 (#1203)
📖 Documentation
- [DOCS] Fix daft.read_parquet link @jaychia (#1228)
- [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
🧰 Maintenance
- [CHORE] Update test to only use store_schema kwarg for pa>=11 @jaychia (#1253)
- [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
- [CHORE] [New Query Planner] Introduce
LogicalPlanBuilder
andQueryPlanner
interfaces to hide query planner implementations. @clarkzinzow (#1245) - [CHORE] LogicalPlan: Add display improvements, and Filter @xcharleslin (#1221)
- [CHORE] Add unit tests for int96 timestamps @jaychia (#1229)
- [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
- [CHORE] disable mac test for lack of docker @samster25 (#1223)
- [CHORE] Begin integrating Rust Logical Plan with Dataframe API @xcharleslin (#1207)
- [CHORE] integration tests for nightly platform wheels @samster25 (#1219)
- [CHORE] Remove existing LogicalPlan from all execution concepts @xcharleslin (#1208)
- [CHORE] Add endpoints to simulate rate-limiting on AWS S3 buckets @jaychia (#1220)
- [CHORE] Add pytest marker for integration @jaychia (#1211)
- [CHORE] Add s3 fixtures for retrying logic @jaychia (#1206)
- [CHORE] Add developer flag to use Rust query planner @xcharleslin (#1205)
- [CHORE] Rust Logical plan skeleton @xcharleslin (#1192)
⬆️ Dependencies
7 changes
- Bump tempfile from 3.7.0 to 3.7.1 @dependabot (#1238)
- Bump ray[data,default] from 2.5.1 to 2.6.1 @dependabot (#1200)
- Bump numpy from 1.25.1 to 1.25.2 @dependabot (#1199)
- Bump tempfile from 3.6.0 to 3.7.0 @dependabot (#1198)
- Bump serde_json from 1.0.103 to 1.0.104 @dependabot (#1197)
- Bump num-traits from 0.2.15 to 0.2.16 @dependabot (#1196)
- Bump serde from 1.0.171 to 1.0.179 @dependabot (#1195)
v0.1.10
Changes
✨ New Features
- [FEAT] Enable feature-flagged native downloader in daft.read_parquet @jaychia (#1190)
- [FEAT] parquet reader refactor, add parquet_stats_reader and parquet_schema_reader (1/2) @samster25 (#1191)
🚀 Performance Improvements
- [PERF] native streaming parquet @samster25 (#1193)
🧰 Maintenance
⬆️ Dependencies
6 changes
- Bump isbang/compose-action from 1.4.1 to 1.5.0 @dependabot (#1178)
- Bump serde_json from 1.0.100 to 1.0.103 @dependabot (#1168)
- Bump pyo3-log from 0.8.2 to 0.8.3 @dependabot (#1167)
- Bump dyn-clone from 1.0.11 to 1.0.12 @dependabot (#1166)
- Bump numpy from 1.25.0 to 1.25.1 @dependabot (#1164)
- Bump lxml from 4.9.2 to 4.9.3 @dependabot (#1163)
v0.1.9
Changes
🏆 Highlights
- [FEAT] [Tensor] Add support for
Tensor
andFixedShapeTensor
types. @clarkzinzow (#1073)
✨ New Features
- [FEAT] Consolidate to list namespace @jaychia (#1180)
- [FEAT] Add .image.crop Expression @jaychia (#1175)
- [FEAT] [Tensor] Add support for
Tensor
andFixedShapeTensor
types. @clarkzinzow (#1073) - [FEAT] Basic support for Arrow 128-bit Decimal. @xcharleslin (#1129)
- [FEAT] Native Parquet Downloader @samster25 (#1107)
🚀 Performance Improvements
- [PERF] Simple Read Planner and RangeReader for Native Parquet Reader @samster25 (#1172)
👾 Bug Fixes
- [BUG] Fix ownership model of IOClient @samster25 (#1128)
- [BUG] Ownership of Runtime and Clients @samster25 (#1125)
📖 Documentation
- [DOCS] Fix broken link to Ray Datasets docs @jaychia (#1186)
- [FEAT] Consolidate to list namespace @jaychia (#1180)
- [DOCS] Add docs for tensor dtype @jaychia (#1170)
- [DOCS] Add Flyte example @jaychia (#1150)
- [CHORE] Update README.rst typo @jaychia (#1141)
🧰 Maintenance
- [CHORE] Bump cargo version to 0.1.9 @jaychia (#1187)
- [CHORE] Exclude JSON pre-commit fixer for ipynb files @jaychia (#1184)
- [CHORE] New daft-plan crate; trait TreeDisplay @xcharleslin (#1176)
- [CHORE] More Parquet benchmarking @jaychia (#1160)
- [CHORE] Enable Parquet Integration tests for decimal types @samster25 (#1161)
- [CHORE] cache all crates @samster25 (#1158)
- [CHORE] move parquet unit tests under io @samster25 (#1157)
- [CHORE] [CI] use smarter github rust cache action @samster25 (#1156)
- [CHORE] bump profiling timeout @samster25 (#1155)
- [CHORE] Native Parquet Integration Tests @samster25 (#1154)
- [CHORE] Remove use of
dirs_exist_ok
which was only added in Py3.8 @jaychia (#1153) - [CHORE] Add parquet benchmarking @jaychia (#1151)
- [CHORE] Cleans up IO integration test fixtures for re-use @jaychia (#1152)
- [CHORE] Update README.rst typo @jaychia (#1141)
- [CHORE] No-op test for various parquet files @jaychia (#1130)
- [CHORE] Tidy typing for remaining binary ops: logical, comp @xcharleslin (#1124)
- [CHORE] Use workspace for cargo check @samster25 (#1127)
⬆️ Dependencies
10 changes
- Bump orjson from 3.9.1 to 3.9.2 @dependabot (#1143)
- Bump pandas from 2.0.2 to 2.0.3 @dependabot (#1142)
- Bump snafu from 0.7.4 to 0.7.5 @dependabot (#1146)
- Bump serde_json from 1.0.99 to 1.0.100 @dependabot (#1147)
- Bump opencv-python from 4.7.0.72 to 4.8.0.74 @dependabot (#1117)
- Bump ray[data,default] from 2.4.0 to 2.5.1 @dependabot (#1074)
- Bump chrono-tz from 0.8.2 to 0.8.3 @dependabot (#1119)
- Bump pyo3 from 0.19.0 to 0.19.1 @dependabot (#1122)
- Bump async-trait from 0.1.68 to 0.1.71 @dependabot (#1126)
- Bump tokio from 1.28.2 to 1.29.1 @dependabot (#1120)