From 98eb85ab0adc12fea3bd12b6dbd64a077fec5159 Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Tue, 7 May 2024 17:55:47 -0400 Subject: [PATCH 1/8] PDEP-10: Change status to rejected --- .../pdeps/0010-required-pyarrow-dependency.md | 43 +++++++++++++++++-- 1 file changed, 40 insertions(+), 3 deletions(-) diff --git a/web/pandas/pdeps/0010-required-pyarrow-dependency.md b/web/pandas/pdeps/0010-required-pyarrow-dependency.md index 4d6e928ce68bd..8fd3b3f3a747d 100644 --- a/web/pandas/pdeps/0010-required-pyarrow-dependency.md +++ b/web/pandas/pdeps/0010-required-pyarrow-dependency.md @@ -1,12 +1,48 @@ # PDEP-10: PyArrow as a required dependency for default string inference implementation -- Created: 17 April 2023 -- Status: Accepted +- Created: 17 April 2023 (updated May 8, 2024) +- Status: Rejected - Discussion: [#52711](https://github.com/pandas-dev/pandas/pull/52711) [#52509](https://github.com/pandas-dev/pandas/issues/52509) - Author: [Matthew Roeschke](https://github.com/mroeschke) [Patrick Hoefler](https://github.com/phofl) -- Revision: 1 +- Revision: 2 + +# Note + +This PDEP was originally accepted on May 8, 2023. However, after reviewing feedback posted +on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of +the core team, have not decided with moving forward with this PDEP for pandas 3.0. + +The primary reasons for rejecting this PDEP are twofold: + +1) Requiring pyarrow as a dependency causes installation problems. + - Pyarrow does not fit or has a hard time fitting in space-constrained environments +such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel +(which is larger than pandas' own wheel sizes) + - Installation of pyarrow is not possible on some platforms. We provide support for some +less widely used platforms such as Alpine Linux (and there is third party support for pandas in +pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. + + While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing +of the PDEP, we underestimated the impact this would have on users, and also downstream developers. + +2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. + + For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics + as our current default object string data type, but that allows users to experience faster performance and memory savings + compared to the object strings. + +While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP +does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe +that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the +ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow +and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us +to potentially revisit this decision in the future. + +However, at this point in time, it is clear that we are not ready to require pyarrow +as a dependency in pandas. + ## Abstract @@ -210,6 +246,7 @@ before releasing a new pandas version. - 17 April 2023: Initial version - 8 May 2023: Changed proposal to make pyarrow required in pandas 3.0 instead of 2.1 +- 8 May 2024: Changed status to rejected [^1] [^2] From 5e451db7a35967ebfebdaa7e37b175b2926431dd Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Sun, 19 May 2024 17:05:37 -0700 Subject: [PATCH 2/8] Split out into new pdep --- .../pdeps/0015-do-not-require-pyarrow.md | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 web/pandas/pdeps/0015-do-not-require-pyarrow.md diff --git a/web/pandas/pdeps/0015-do-not-require-pyarrow.md b/web/pandas/pdeps/0015-do-not-require-pyarrow.md new file mode 100644 index 0000000000000..9ba46b6109a7b --- /dev/null +++ b/web/pandas/pdeps/0015-do-not-require-pyarrow.md @@ -0,0 +1,46 @@ +# PDEP-15: Do not require PyArrow as a required dependency (for pandas 3.0) + +- Created: 8 May 2024 +- Status: Under Discussion +- Discussion: [#58623](https://github.com/pandas-dev/pandas/pull/58623) + [#52711](https://github.com/pandas-dev/pandas/pull/52711) + [#52509](https://github.com/pandas-dev/pandas/issues/52509) + [#54466](https://github.com/pandas-dev/pandas/issues/54466) +- Author: [Thomas Li](https://github.com/lithomas1) +- Revision: 1 + +## Abstract + +This PDEP was supersedes PDEP-10, which stipulated that PyArrow should become a required dependency +for pandas 3.0. After reviewing feedback posted +on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of +the core team, have decided against moving forward with this PDEP for pandas 3.0. + +The primary reasons for rejecting this PDEP are twofold: + +1) Requiring pyarrow as a dependency causes installation problems. + - Pyarrow does not fit or has a hard time fitting in space-constrained environments +such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel +(which is larger than pandas' own wheel sizes) + - Installation of pyarrow is not possible on some platforms. We provide support for some +less widely used platforms such as Alpine Linux (and there is third party support for pandas in +pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. + + While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing +of the PDEP, we underestimated the impact this would have on users, and also downstream developers. + +2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. + + For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics + as our current default object string data type, but that allows users to experience faster performance and memory savings + compared to the object strings (if pyarrow is installed). + +While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP +does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe +that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the +ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow +and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us +to potentially revisit this decision in the future. + +However, at this point in time, it is clear that we are not ready to require pyarrow +as a dependency in pandas. From 2af5632a4f2c323aaa1dbf3c60b0ffab0c215576 Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Mon, 20 May 2024 22:28:06 -0700 Subject: [PATCH 3/8] remove pdep-10 changes --- .../pdeps/0010-required-pyarrow-dependency.md | 41 ++----------------- 1 file changed, 4 insertions(+), 37 deletions(-) diff --git a/web/pandas/pdeps/0010-required-pyarrow-dependency.md b/web/pandas/pdeps/0010-required-pyarrow-dependency.md index 8fd3b3f3a747d..d5737f6462bb4 100644 --- a/web/pandas/pdeps/0010-required-pyarrow-dependency.md +++ b/web/pandas/pdeps/0010-required-pyarrow-dependency.md @@ -1,48 +1,16 @@ # PDEP-10: PyArrow as a required dependency for default string inference implementation -- Created: 17 April 2023 (updated May 8, 2024) -- Status: Rejected +- Created: 17 April 2023 +- Status: Accepted - Discussion: [#52711](https://github.com/pandas-dev/pandas/pull/52711) [#52509](https://github.com/pandas-dev/pandas/issues/52509) - Author: [Matthew Roeschke](https://github.com/mroeschke) [Patrick Hoefler](https://github.com/phofl) -- Revision: 2 +- Revision: 1 # Note -This PDEP was originally accepted on May 8, 2023. However, after reviewing feedback posted -on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of -the core team, have not decided with moving forward with this PDEP for pandas 3.0. - -The primary reasons for rejecting this PDEP are twofold: - -1) Requiring pyarrow as a dependency causes installation problems. - - Pyarrow does not fit or has a hard time fitting in space-constrained environments -such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel -(which is larger than pandas' own wheel sizes) - - Installation of pyarrow is not possible on some platforms. We provide support for some -less widely used platforms such as Alpine Linux (and there is third party support for pandas in -pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. - - While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing -of the PDEP, we underestimated the impact this would have on users, and also downstream developers. - -2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. - - For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics - as our current default object string data type, but that allows users to experience faster performance and memory savings - compared to the object strings. - -While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP -does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe -that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the -ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow -and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us -to potentially revisit this decision in the future. - -However, at this point in time, it is clear that we are not ready to require pyarrow -as a dependency in pandas. - +This PDEP is superseded by PDEP-15. ## Abstract @@ -246,7 +214,6 @@ before releasing a new pandas version. - 17 April 2023: Initial version - 8 May 2023: Changed proposal to make pyarrow required in pandas 3.0 instead of 2.1 -- 8 May 2024: Changed status to rejected [^1] [^2] From 6e4efe5a5eff823d32a1d2d7104d594019966d3f Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Mon, 20 May 2024 22:30:30 -0700 Subject: [PATCH 4/8] Apply suggestions from code review Co-authored-by: Irv Lustig --- web/pandas/pdeps/0015-do-not-require-pyarrow.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0015-do-not-require-pyarrow.md b/web/pandas/pdeps/0015-do-not-require-pyarrow.md index 9ba46b6109a7b..9c3fba0fab3e2 100644 --- a/web/pandas/pdeps/0015-do-not-require-pyarrow.md +++ b/web/pandas/pdeps/0015-do-not-require-pyarrow.md @@ -29,13 +29,13 @@ pyodide, a WASM distribution of pandas), both of which pyarrow does not provide While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing of the PDEP, we underestimated the impact this would have on users, and also downstream developers. -2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. +2) Many of the benefits presented in PDEP-10 can be materialized even with payrrow as an optional dependency. For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics as our current default object string data type, but that allows users to experience faster performance and memory savings compared to the object strings (if pyarrow is installed). -While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP +While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of PDEP-10 does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow From 45754bff88b33aab5098ab00a8c85b3ad458f0ed Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Mon, 20 May 2024 22:31:15 -0700 Subject: [PATCH 5/8] Apply suggestions from code review Co-authored-by: Irv Lustig --- web/pandas/pdeps/0015-do-not-require-pyarrow.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0015-do-not-require-pyarrow.md b/web/pandas/pdeps/0015-do-not-require-pyarrow.md index 9c3fba0fab3e2..4bfb1a49d30f6 100644 --- a/web/pandas/pdeps/0015-do-not-require-pyarrow.md +++ b/web/pandas/pdeps/0015-do-not-require-pyarrow.md @@ -11,7 +11,7 @@ ## Abstract -This PDEP was supersedes PDEP-10, which stipulated that PyArrow should become a required dependency +This PDEP supersedes PDEP-10, which stipulated that PyArrow should become a required dependency for pandas 3.0. After reviewing feedback posted on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of the core team, have decided against moving forward with this PDEP for pandas 3.0. @@ -26,7 +26,7 @@ such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compil less widely used platforms such as Alpine Linux (and there is third party support for pandas in pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. - While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing + While both of these reasons are mentioned in the drawbacks section of PDEP-10, at the time of the writing of the PDEP, we underestimated the impact this would have on users, and also downstream developers. 2) Many of the benefits presented in PDEP-10 can be materialized even with payrrow as an optional dependency. From 783363779b1a87444aa0e4787a6ad817c06d082d Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Tue, 25 Jun 2024 21:18:23 -0700 Subject: [PATCH 6/8] update a little --- .../pdeps/0015-do-not-require-pyarrow.md | 31 ++++++++++++++----- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/web/pandas/pdeps/0015-do-not-require-pyarrow.md b/web/pandas/pdeps/0015-do-not-require-pyarrow.md index 9ba46b6109a7b..945226ce1b0d9 100644 --- a/web/pandas/pdeps/0015-do-not-require-pyarrow.md +++ b/web/pandas/pdeps/0015-do-not-require-pyarrow.md @@ -29,15 +29,32 @@ pyodide, a WASM distribution of pandas), both of which pyarrow does not provide While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing of the PDEP, we underestimated the impact this would have on users, and also downstream developers. -2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. +2) Many of the benefits presented in PDEP-10 can be materialized for users that have pyarrow installed, without + forcing a pyarrow requirement on other users. - For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics - as our current default object string data type, but that allows users to experience faster performance and memory savings - compared to the object strings (if pyarrow is installed). + In PDEP-10, there are three primary benefits listed: -While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP -does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe -that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the + - First class support for strings. + + - This is covered by PDEP-14, which will enable the usage of a pyarrow backed string dtype by default, + (for users who have pyarrow instaleld) and the use of a Python object based fallback in the case . + + - Support for dtypes not present in pandas (e.g. nested dtypes, decimals) + - Users can already create arrays with these dtypes if they have pyarrow installed, but we cannot infer + arrays to those dtypes by default without pyarrow installed (as there is no Python/numpy equivalent). + + - Interoperability + - The Arrow C Data Interface would allow us to import/export pandas DataFrames to and from other libraries + that support Arrow in a zero-copy manner. + + Support for the Arrow C Data interface in pandas and other libraries in the ecosystem is still very new, though, + (support in pandas itself was only added as of pandas 2.2), and the dataframe interchange protocol, which allows + for dataframe interchange between Python dataframe implementations is currently better supported in downstream + libraries. + +Although this PR recommends not adopting pyarrow as a required dependency in pandas 3.0, this does not mean that we are +abandoning pyarrow support and integration in pandas. Adopting support for pyarrow arrays +and data types in more of pandas will lead to greater interoperability with the ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us to potentially revisit this decision in the future. From 1b3bdeea47ec20e84e2cbc176a823c45437d7c09 Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Sun, 28 Jul 2024 12:04:09 -0700 Subject: [PATCH 7/8] minor update --- .../pdeps/0015-do-not-require-pyarrow.md | 52 +++++++++++-------- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/web/pandas/pdeps/0015-do-not-require-pyarrow.md b/web/pandas/pdeps/0015-do-not-require-pyarrow.md index c3e5405281c93..a1de0bf692aa4 100644 --- a/web/pandas/pdeps/0015-do-not-require-pyarrow.md +++ b/web/pandas/pdeps/0015-do-not-require-pyarrow.md @@ -1,7 +1,7 @@ # PDEP-15: Do not require PyArrow as a required dependency (for pandas 3.0) - Created: 8 May 2024 -- Status: Under Discussion +- Status: Under discussion - Discussion: [#58623](https://github.com/pandas-dev/pandas/pull/58623) [#52711](https://github.com/pandas-dev/pandas/pull/52711) [#52509](https://github.com/pandas-dev/pandas/issues/52509) @@ -13,20 +13,21 @@ This PDEP supersedes PDEP-10, which stipulated that PyArrow should become a required dependency for pandas 3.0. After reviewing feedback posted -on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of -the core team, have decided against moving forward with this PDEP for pandas 3.0. +on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we've +decided against moving forward with this PDEP for pandas 3.0. The primary reasons for rejecting this PDEP are twofold: 1) Requiring pyarrow as a dependency causes installation problems. + - Pyarrow does not fit or has a hard time fitting in space-constrained environments -such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel +such as AWS Lambda, due to its large size of around ~40 MB for a compiled wheel (which is larger than pandas' own wheel sizes) + - Installation of pyarrow is not possible on some platforms. We provide support for some -less widely used platforms such as Alpine Linux (and there is third party support for pandas in -pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. +less widely used platforms such as Alpine Linux, which pyarrow does not provide wheels for. - While both of these reasons are mentioned in the drawbacks section of PDEP-10, at the time of the writing + While installation issues are mentioned in the drawbacks section of PDEP-10, at the time of the writing of the PDEP, we underestimated the impact this would have on users, and also downstream developers. 2) Many of the benefits presented in PDEP-10 can be materialized for users that have pyarrow installed, without @@ -34,30 +35,37 @@ of the PDEP, we underestimated the impact this would have on users, and also dow In PDEP-10, there are three primary benefits listed: - - First class support for strings. + - First class support for strings. - - This is covered by PDEP-14, which will enable the usage of a pyarrow backed string dtype by default, - (for users who have pyarrow instaleld) and the use of a Python object based fallback in the case . + - PDEP-14 enables a new string dtype by default for pandas 3.0, + which will be backed by a pyarrow string dtype by default, + (for users who have pyarrow installed) and use a Python object based fallback for + users that don't have pyarrow installed. This allows all users to experience the usability + benefits of a string dtype by default, and for users with pyarrow to experience the performance + benefits of a pyarrow backed string array. - - Support for dtypes not present in pandas (e.g. nested dtypes, decimals) - - Users can already create arrays with these dtypes if they have pyarrow installed, but we cannot infer - arrays to those dtypes by default without pyarrow installed (as there is no Python/numpy equivalent). + - Support for dtypes not present in pandas. + - There are some types in pyarrow that don't have a corresponding pandas/numpy dtype, for example + the nested pyarrow types(e.g. lists and structs), and decimal types. + - Currently, users can already create arrays with these dtypes if they have pyarrow installed, but we cannot infer + arrays to those dtypes by default, without forcing a pyarrow requirement on users, + as there is no Python/numpy equivalent for these dtypes). - - Interoperability - - The Arrow C Data Interface would allow us to import/export pandas DataFrames to and from other libraries - that support Arrow in a zero-copy manner. + - Interoperability + - The Arrow C Data Interface would allow us to import/export pandas DataFrames to and from other libraries + that support Arrow in a zero-copy manner. - Support for the Arrow C Data interface in pandas and other libraries in the ecosystem is still very new, though, - (support in pandas itself was only added as of pandas 2.2), and the dataframe interchange protocol, which allows - for dataframe interchange between Python dataframe implementations is currently better supported in downstream - libraries. + - Support for the Arrow C Data interface in pandas and other libraries in the ecosystem is still very new, though, + (support in pandas itself was only added as of pandas 2.2), and the dataframe interchange protocol, which allows + for dataframe interchange between Python dataframe implementations is currently better supported in downstream + libraries. Although this PR recommends not adopting pyarrow as a required dependency in pandas 3.0, this does not mean that we are abandoning pyarrow support and integration in pandas. Adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of -pyarrow and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for -them, allowing us to potentially revisit this decision in the future. +pyarrow and the lack of support for certain platforms, can be solved (as shown by the recent addition of pyarrow to the pyodide +distributions), allowing us to potentially revisit this decision in the future. However, at this point in time, it is clear that we are not ready to require pyarrow as a dependency in pandas. From e5de753188441884f56948b01a9a5b23a98f5cbe Mon Sep 17 00:00:00 2001 From: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Thu, 29 Aug 2024 23:20:10 -0400 Subject: [PATCH 8/8] small updates --- .../pdeps/0015-do-not-require-pyarrow.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/web/pandas/pdeps/0015-do-not-require-pyarrow.md b/web/pandas/pdeps/0015-do-not-require-pyarrow.md index a1de0bf692aa4..8231eceaacad0 100644 --- a/web/pandas/pdeps/0015-do-not-require-pyarrow.md +++ b/web/pandas/pdeps/0015-do-not-require-pyarrow.md @@ -18,14 +18,23 @@ decided against moving forward with this PDEP for pandas 3.0. The primary reasons for rejecting this PDEP are twofold: -1) Requiring pyarrow as a dependency causes installation problems. +1) Requiring pyarrow as a dependency can cause installation problems for a significant portion of users. - Pyarrow does not fit or has a hard time fitting in space-constrained environments such as AWS Lambda, due to its large size of around ~40 MB for a compiled wheel (which is larger than pandas' own wheel sizes) + - This can also cause problems for downstream libraries that use pandas as a dependency + as while pandas + pyarrow can potentially fit in an AWS Lambda environment, the combination of + pandas, pyarrow, and the downstream library may not fit. + - While it may potentially be possible to work around this issue by using the AWS Lambda Layer from + the [AWS SDK for pandas](https://aws-sdk-pandas.readthedocs.io/en/stable/install.html#aws-lambda-layer), + the primary benefit of pyarrow strings is not enough to force users to make a disruptive change. - Installation of pyarrow is not possible on some platforms. We provide support for some less widely used platforms such as Alpine Linux, which pyarrow does not provide wheels for. + - While pyarrow has made great strides towards supporting most platforms that pandas is installable on + (e.g. the recent addition of pyodide support in pyarrow), we would still have to drop support for some + platforms like musllinux (the feature request is tracked [here](https://github.com/apache/arrow/issues/18036)) if pyarrow was to be required. While installation issues are mentioned in the drawbacks section of PDEP-10, at the time of the writing of the PDEP, we underestimated the impact this would have on users, and also downstream developers. @@ -55,10 +64,9 @@ of the PDEP, we underestimated the impact this would have on users, and also dow - The Arrow C Data Interface would allow us to import/export pandas DataFrames to and from other libraries that support Arrow in a zero-copy manner. - - Support for the Arrow C Data interface in pandas and other libraries in the ecosystem is still very new, though, - (support in pandas itself was only added as of pandas 2.2), and the dataframe interchange protocol, which allows - for dataframe interchange between Python dataframe implementations is currently better supported in downstream - libraries. + - While several libraries have adopted the Arrow C Data Interface, e.g. polars, xgboost, duckdb, etc., the main + beneficiaries of Arrow C Data Interface are other dataframe libraries, as most downstream libraries tend to + already support using pandas dataframes as input. Although this PR recommends not adopting pyarrow as a required dependency in pandas 3.0, this does not mean that we are abandoning pyarrow support and integration in pandas. Adopting support for pyarrow arrays