Skip to content

Commit 685f7ba

Browse files
authored
FIX-modin-project#7272: Remove HDK engine (modin-project#7275)
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
1 parent 8a62c3b commit 685f7ba

File tree

136 files changed

+189
-21208
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

136 files changed

+189
-21208
lines changed

.github/workflows/ci-notebooks.yml

+10-16
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ on:
77
- .github/workflows/ci-notebooks.yml
88
- setup.cfg
99
- setup.py
10-
- requirements/env_hdk.yml
1110
- requirements/env_unidist_linux.yml
1211
concurrency:
1312
# Cancel other jobs in the same branch. We don't care whether CI passes
@@ -25,16 +24,11 @@ jobs:
2524
runs-on: ubuntu-latest
2625
strategy:
2726
matrix:
28-
execution: [pandas_on_ray, pandas_on_dask, pandas_on_unidist, hdk_on_native]
27+
execution: [pandas_on_ray, pandas_on_dask, pandas_on_unidist]
2928
steps:
3029
- uses: actions/checkout@v4
3130
- uses: ./.github/actions/python-only
32-
if: matrix.execution != 'hdk_on_native' && matrix.execution != 'pandas_on_unidist'
33-
- uses: ./.github/actions/mamba-env
34-
with:
35-
environment-file: requirements/env_hdk.yml
36-
activate-environment: modin_on_hdk
37-
if: matrix.execution == 'hdk_on_native'
31+
if: matrix.execution != 'pandas_on_unidist'
3832
- uses: ./.github/actions/mamba-env
3933
with:
4034
environment-file: requirements/env_unidist_linux.yml
@@ -49,29 +43,29 @@ jobs:
4943
# replace modin with . in the tutorial requirements file for `pandas_on_ray` and
5044
# `pandas_on_dask` since we need Modin built from sources
5145
- run: sed -i 's/modin/./g' examples/tutorial/jupyter/execution/${{ matrix.execution }}/requirements.txt
52-
if: matrix.execution != 'hdk_on_native' && matrix.execution != 'pandas_on_unidist'
46+
if: matrix.execution != 'pandas_on_unidist'
5347
# install dependencies required for notebooks execution for `pandas_on_ray` and `pandas_on_dask`
5448
# Override modin-spreadsheet install for now
5549
- run: |
5650
pip install -r examples/tutorial/jupyter/execution/${{ matrix.execution }}/requirements.txt
5751
pip install git+https://github.com/modin-project/modin-spreadsheet.git@49ffd89f683f54c311867d602c55443fb11bf2a5
58-
if: matrix.execution != 'hdk_on_native' && matrix.execution != 'pandas_on_unidist'
59-
# Build Modin from sources for `hdk_on_native` and `pandas_on_unidist`
52+
if: matrix.execution != 'pandas_on_unidist'
53+
# Build Modin from sources for `pandas_on_unidist`
6054
- run: pip install -e .
61-
if: matrix.execution == 'hdk_on_native' || matrix.execution == 'pandas_on_unidist'
55+
if: matrix.execution == 'pandas_on_unidist'
6256
# install test dependencies
6357
# NOTE: If you are changing the set of packages installed here, make sure that
6458
# the dev requirements match them.
6559
- run: pip install pytest pytest-cov black flake8 flake8-print flake8-no-implicit-concat
66-
if: matrix.execution != 'hdk_on_native' && matrix.execution != 'pandas_on_unidist'
60+
if: matrix.execution != 'pandas_on_unidist'
6761
- run: pip install flake8-print jupyter nbformat nbconvert
68-
if: matrix.execution == 'hdk_on_native' || matrix.execution == 'pandas_on_unidist'
62+
if: matrix.execution == 'pandas_on_unidist'
6963
- run: pip list
70-
if: matrix.execution != 'hdk_on_native' && matrix.execution != 'pandas_on_unidist'
64+
if: matrix.execution != 'pandas_on_unidist'
7165
- run: |
7266
conda info
7367
conda list
74-
if: matrix.execution == 'hdk_on_native' || matrix.execution == 'pandas_on_unidist'
68+
if: matrix.execution == 'pandas_on_unidist'
7569
# setup kernel configuration for `pandas_on_unidist` execution with mpi backend
7670
- run: python examples/tutorial/jupyter/execution/${{ matrix.execution }}/setup_kernel.py
7771
if: matrix.execution == 'pandas_on_unidist'

.github/workflows/ci-required.yml

-13
Original file line numberDiff line numberDiff line change
@@ -90,19 +90,6 @@ jobs:
9090
modin/experimental/pandas/__init__.py
9191
- run: python scripts/doc_checker.py modin/core/storage_formats/base
9292
- run: python scripts/doc_checker.py modin/core/storage_formats/pandas
93-
- run: |
94-
python scripts/doc_checker.py \
95-
modin/experimental/core/execution/native/implementations/hdk_on_native/dataframe \
96-
modin/experimental/core/execution/native/implementations/hdk_on_native/io \
97-
modin/experimental/core/execution/native/implementations/hdk_on_native/partitioning \
98-
modin/experimental/core/execution/native/implementations/hdk_on_native/calcite_algebra.py \
99-
modin/experimental/core/execution/native/implementations/hdk_on_native/calcite_builder.py \
100-
modin/experimental/core/execution/native/implementations/hdk_on_native/calcite_serializer.py \
101-
modin/experimental/core/execution/native/implementations/hdk_on_native/df_algebra.py \
102-
modin/experimental/core/execution/native/implementations/hdk_on_native/expr.py \
103-
modin/experimental/core/execution/native/implementations/hdk_on_native/hdk_worker.py \
104-
- run: python scripts/doc_checker.py modin/experimental/core/storage_formats/hdk
105-
- run: python scripts/doc_checker.py modin/experimental/core/execution/native/implementations/hdk_on_native/interchange/dataframe_protocol
10693
- run: python scripts/doc_checker.py modin/experimental/batch/pipeline.py
10794
- run: python scripts/doc_checker.py modin/logging
10895

.github/workflows/ci.yml

+1-72
Original file line numberDiff line numberDiff line change
@@ -150,62 +150,6 @@ jobs:
150150
runner: python -m pytest --execution=${{ matrix.execution }}
151151
- uses: ./.github/actions/upload-coverage
152152

153-
test-hdk:
154-
needs: [lint-flake8]
155-
runs-on: ubuntu-latest
156-
defaults:
157-
run:
158-
shell: bash -l {0}
159-
env:
160-
MODIN_EXPERIMENTAL: "True"
161-
MODIN_ENGINE: "native"
162-
MODIN_STORAGE_FORMAT: "hdk"
163-
name: Test HDK storage format, Python 3.9
164-
services:
165-
moto:
166-
image: motoserver/moto
167-
ports:
168-
- 5000:5000
169-
env:
170-
AWS_ACCESS_KEY_ID: foobar_key
171-
AWS_SECRET_ACCESS_KEY: foobar_secret
172-
steps:
173-
- uses: actions/checkout@v4
174-
- uses: ./.github/actions/mamba-env
175-
with:
176-
environment-file: requirements/env_hdk.yml
177-
activate-environment: modin_on_hdk
178-
- name: Install HDF5
179-
run: sudo apt update && sudo apt install -y libhdf5-dev
180-
- run: python -m pytest modin/tests/core/storage_formats/hdk/test_internals.py
181-
- run: python -m pytest modin/tests/experimental/hdk_on_native/test_init.py
182-
- run: python -m pytest modin/tests/experimental/hdk_on_native/test_dataframe.py
183-
- run: python -m pytest modin/tests/experimental/hdk_on_native/test_utils.py
184-
- run: python -m pytest modin/tests/pandas/test_io.py --verbose
185-
- run: python -m pytest modin/tests/interchange/dataframe_protocol/test_general.py
186-
- run: python -m pytest modin/tests/interchange/dataframe_protocol/hdk
187-
- run: python -m pytest modin/tests/experimental/test_sql.py
188-
- run: python -m pytest modin/tests/pandas/test_concat.py
189-
- run: python -m pytest modin/tests/pandas/dataframe/test_binary.py
190-
- run: python -m pytest modin/tests/pandas/dataframe/test_reduce.py
191-
- run: python -m pytest modin/tests/pandas/dataframe/test_join_sort.py
192-
- run: python -m pytest modin/tests/pandas/test_general.py
193-
- run: python -m pytest modin/tests/pandas/dataframe/test_indexing.py
194-
- run: python -m pytest modin/tests/pandas/test_series.py
195-
- run: python -m pytest modin/tests/pandas/dataframe/test_map_metadata.py
196-
- run: python -m pytest modin/tests/pandas/dataframe/test_window.py
197-
- run: python -m pytest modin/tests/pandas/dataframe/test_default.py
198-
- run: python examples/docker/modin-hdk/census-hdk.py examples/data/census_1k.csv -no-ml
199-
- run: python examples/docker/modin-hdk/nyc-taxi-hdk.py examples/data/nyc-taxi_1k.csv
200-
- run: |
201-
python examples/docker/modin-hdk/plasticc-hdk.py \
202-
examples/data/plasticc_training_set_1k.csv \
203-
examples/data/plasticc_test_set_1k.csv \
204-
examples/data/plasticc_training_set_metadata_1k.csv \
205-
examples/data/plasticc_test_set_metadata_1k.csv \
206-
-no-ml
207-
- uses: ./.github/actions/upload-coverage
208-
209153
test-asv-benchmarks:
210154
if: github.event_name == 'pull_request'
211155
needs: [lint-flake8]
@@ -249,18 +193,6 @@ jobs:
249193
# check pure pandas
250194
MODIN_ASV_USE_IMPL=pandas asv run --quick --dry-run --python=same --strict --show-stderr --launch-method=spawn \
251195
-b ^benchmarks -b ^io | tee benchmarks.log
252-
253-
# TODO: Remove manual environment creation after fix https://github.com/airspeed-velocity/asv/issues/1310
254-
conda deactivate
255-
mamba env create -f ../requirements/env_hdk.yml
256-
conda activate modin_on_hdk
257-
pip install asv==0.5.1
258-
pip install ..
259-
260-
# check Modin on HDK
261-
MODIN_ENGINE=native MODIN_STORAGE_FORMAT=hdk MODIN_EXPERIMENTAL=true asv run --quick --dry-run --python=same --strict --show-stderr \
262-
--launch-method=forkserver --python=same --config asv.conf.hdk.json \
263-
-b ^hdk | tee benchmarks.log
264196
else
265197
echo "Benchmarks did not run, no changes detected"
266198
fi
@@ -374,7 +306,6 @@ jobs:
374306
- run: |
375307
mpiexec -n 1 -genv AWS_ACCESS_KEY_ID foobar_key -genv AWS_SECRET_ACCESS_KEY foobar_secret \
376308
python -m pytest modin/tests/experimental/test_io_exp.py
377-
- run: mpiexec -n 1 python -m pytest modin/tests/experimental/test_sql.py
378309
- run: mpiexec -n 1 python -m pytest modin/tests/interchange/dataframe_protocol/test_general.py
379310
- run: mpiexec -n 1 python -m pytest modin/tests/interchange/dataframe_protocol/pandas/test_protocol.py
380311
- run: |
@@ -495,8 +426,6 @@ jobs:
495426
if: matrix.engine == 'python' || matrix.test_task == 'group_4'
496427
- run: python -m pytest modin/tests/experimental/test_io_exp.py
497428
if: matrix.engine == 'python' || matrix.test_task == 'group_4'
498-
- run: python -m pytest modin/tests/experimental/test_sql.py
499-
if: matrix.os == 'ubuntu' && (matrix.engine == 'python' || matrix.test_task == 'group_4')
500429
- run: python -m pytest modin/tests/interchange/dataframe_protocol/test_general.py
501430
if: matrix.engine == 'python' || matrix.test_task == 'group_4'
502431
- run: python -m pytest modin/tests/interchange/dataframe_protocol/pandas/test_protocol.py
@@ -703,7 +632,7 @@ jobs:
703632
- run: python -m pytest modin/tests/experimental/spreadsheet/test_general.py
704633

705634
merge-coverage-artifacts:
706-
needs: [test-internals, test-api-and-no-engine, test-defaults, test-hdk, test-all-unidist, test-all, test-experimental, test-sanity]
635+
needs: [test-internals, test-api-and-no-engine, test-defaults, test-all-unidist, test-all, test-experimental, test-sanity]
707636
if: always() # we need to run it regardless of some job being skipped, like in PR
708637
runs-on: ubuntu-latest
709638
defaults:

.github/workflows/codeql/codeql-config.yml

-2
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,3 @@ name: "Modin CodeQL config"
22

33
paths:
44
- modin/**
5-
paths-ignore:
6-
- modin/tests/experimental/hdk_on_native/** # TODO: fix unhashable list error, see #5227

CODEOWNERS

-5
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,3 @@
11
# These owners will be the default owners for everything in
22
# the repo unless a later match takes precedence,
33
* @modin-project/modin-core @devin-petersohn @mvashishtha @RehanSD @YarShev @vnlitvinov @anmyachev @dchigarev
4-
5-
# These owners will review everything in the HDK engine component
6-
# of Modin.
7-
/modin/experimental/core/storage_formats/hdk/** @modin-project/modin-hdk @aregm @gshimansky @ienkovich @Garra1980 @YarShev @vnlitvinov @anmyachev @dchigarev @AndreyPavlenko
8-
/modin/experimental/core/execution/native/implementations/hdk_on_native/** @modin-project/modin-hdk @aregm @gshimansky @ienkovich @Garra1980 @YarShev @vnlitvinov @anmyachev @dchigarev @AndreyPavlenko

README.md

+4-12
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,8 @@ Modin automatically detects which engine(s) you have installed and uses that for
8585
#### From conda-forge
8686

8787
Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all`
88-
will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask),
89-
[MPI through unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk).
88+
will install Modin and three engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask) and
89+
[MPI through unidist](https://github.com/modin-project/unidist).
9090

9191
```bash
9292
conda install -c conda-forge modin-all
@@ -98,7 +98,6 @@ Each engine can also be installed individually (and also as a combination of sev
9898
conda install -c conda-forge modin-ray # Install Modin dependencies and Ray.
9999
conda install -c conda-forge modin-dask # Install Modin dependencies and Dask.
100100
conda install -c conda-forge modin-mpi # Install Modin dependencies and MPI through unidist.
101-
conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK.
102101
```
103102

104103
**Note:** Since Modin 0.30.0 we use a reduced set of Ray dependencies: `ray-core` instead of `ray-default`.
@@ -118,13 +117,13 @@ conda install -n base conda-libmamba-solver
118117
and then use it during istallation either like:
119118

120119
```bash
121-
conda install -c conda-forge modin-ray modin-hdk --experimental-solver=libmamba
120+
conda install -c conda-forge modin-ray --experimental-solver=libmamba
122121
```
123122

124123
or starting from conda 22.11 and libmamba solver 22.12 versions:
125124

126125
```bash
127-
conda install -c conda-forge modin-ray modin-hdk --solver=libmamba
126+
conda install -c conda-forge modin-ray --solver=libmamba
128127
```
129128

130129
#### Choosing a Compute Engine
@@ -158,8 +157,6 @@ modin_cfg.Engine.put('unidist') # Modin will use Unidist
158157
unidist_cfg.Backend.put('mpi') # Unidist will use MPI backend
159158
```
160159

161-
Check [this Modin docs section](https://modin.readthedocs.io/en/latest/development/using_hdk.html) for HDK engine setup.
162-
163160
_Note: You should not change the engine after your first operation with Modin as it will result in undefined behavior._
164161

165162
#### Which engine should I use?
@@ -168,11 +165,6 @@ On Linux, MacOS, and Windows you can install and use either Ray, Dask or MPI thr
168165
to use either of these engines as Modin abstracts away all of the complexity, so feel
169166
free to pick either!
170167

171-
On Linux you also can choose [HDK](https://modin.readthedocs.io/en/latest/development/using_hdk.html), which is an experimental
172-
engine based on [HDK](https://github.com/intel-ai/hdk) and included in the
173-
[Intel® Distribution of Modin](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html),
174-
which is a part of [Intel® oneAPI AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html).
175-
176168
### Pandas API Coverage
177169

178170
<p align="center">

asv_bench/asv.conf.hdk.json

-60
This file was deleted.

asv_bench/benchmarks/benchmarks.py

-3
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@
3636
random_columns,
3737
random_string,
3838
translator_groupby_ngroups,
39-
trigger_import,
4039
)
4140

4241

@@ -675,7 +674,6 @@ class TimeIndexing:
675674

676675
def setup(self, shape, indexer_type):
677676
self.df = generate_dataframe("int", *shape, RAND_LOW, RAND_HIGH)
678-
trigger_import(self.df)
679677

680678
self.indexer = self.indexer_getters[indexer_type](self.df)
681679
if isinstance(self.indexer, (IMPL.Series, IMPL.DataFrame)):
@@ -701,7 +699,6 @@ class TimeIndexingColumns:
701699

702700
def setup(self, shape):
703701
self.df = generate_dataframe("int", *shape, RAND_LOW, RAND_HIGH)
704-
trigger_import(self.df)
705702
self.numeric_indexer = [0, 1]
706703
self.labels_indexer = self.df.columns[self.numeric_indexer].tolist()
707704

asv_bench/benchmarks/hdk/__init__.py

-14
This file was deleted.

0 commit comments

Comments
 (0)