initial version of the cpknextgen library

Co-authored-by: Stanislav Král <stanislav.kral@diribet.cz> Co-authored-by: Helena Paulasová <helena.paulasova@diribet.cz> Co-authored-by: Vlastimil Dolejš <vlastimil.dolejs@diribet.cz>
diribet · Dec 5, 2023 · 4aa598f · 4aa598f
commit 4aa598f
Show file tree

Hide file tree

Showing 22 changed files with 20,054 additions and 0 deletions.
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -0,0 +1,59 @@
+name: Release
+
+on:
+  push:
+    tags:
+      - '**'
+
+jobs:
+  package:
+    name: Build dist packages
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout source code
+        uses: actions/checkout@v4
+
+      - name: Setup python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.11"
+
+      - name: Install build tools
+        run: python -m pip install --upgrade pip build
+
+      - name: Build package
+        run: python -m build
+
+      - name: Create GitHub release
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          tag: ${{ github.ref_name }}
+        run: |
+          gh release create "$tag" \
+              --repo="$GITHUB_REPOSITORY" \
+              --title="${tag#v}"
+
+      - name: Store the distribution packages
+        uses: actions/upload-artifact@v3
+        with:
+          name: dist-packages
+          path: dist/
+
+  pypi-publish:
+    name: Upload packages to PyPI
+    needs: package
+    runs-on: ubuntu-latest
+    environment:
+      name: release
+      url: https://pypi.org/p/cpknextgen
+    permissions:
+      id-token: write
+    steps:
+      - name: Download all the dists packages
+        uses: actions/download-artifact@v3
+        with:
+          name: dist-packages
+          path: dist/
+
+      - name: Publish package distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -0,0 +1,36 @@
+name: Test
+on:
+  push:
+  pull_request:
+
+concurrency:
+  group: test-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  test:
+    name: Test with ${{ matrix.py }}
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        py:
+          - "3.11"
+          - "3.10"
+          - "3.9"
+    steps:
+      - name: Checkout source code
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Setup python for test ${{ matrix.py }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.py }}
+
+      - name: Install tox
+        run: python -m pip install tox-gh>=4
+
+      - name: Run test suite
+        run: tox run
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,11 @@
+# ignore idea files
+**/.idea
+**/*.iml
+
+# ignore local python env
+env/
+venv/
+
+dist/
+
+.tox
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,48 @@
+# Contributing guidelines
+
+This project use [`tox`](https://pypi.org/project/tox/) to simplify virtual environment management and testing.
+
+## Dependencies
+Dependencies are defined in `pyproject.toml` using version ranges. 
+We want to define broad version ranges so the users of this library won't run into dependency conflict.
+
+### Pinned dependencies
+Pinned versions of dependencies are defined in `requirements.txt` and `requirements-test.txt`. 
+These are used for local development and testing on CI.
+These files are automatically generated using `pip-compile` TOX environment.
+
+To re-generate requirements files run:
+```
+tox run -e pip-compile
+```
+
+## Running tests
+You can run tests for all supported Python versions using:
+```
+tox run
+```
+
+Or for specific Python version:
+```
+tox run -e py311
+```
+
+Or manually invoke pytest:
+```
+pytest tests
+```
+
+## Build
+Build tools are defined in `pyproject.toml`.
+Backend tool is [`hatchling`](https://hatch.pypa.io/latest/) - responsible for building distribution package.
+CLI tool is [`build`](https://pypa-build.readthedocs.io/en/stable/index.html).
+The package name specified in .toml config must match the directory name src/[package_name]
+if `package_name` contains `__init__.py`.
+
+You can run the build manually:
+1. Install build: `python -m pip install build`
+2. Run `python -m build` from the project root - this will create wheel in the `/dist` folder.
+
+Or using tox:
+1. Run `tox run`
+2. Wheel is created in `.tox/.pkg/dist` folder.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,19 @@
+Copyright (c) 2023 Diribet Thesaurus s.r.o.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
+OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,51 @@
+# C<sub>pk</sub> NextGen
+[![Build](https://github.com/diribet/cpknextgen/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/diribet/cpknextgen/actions/workflows/test.yml)
+[![PyPI - Version](https://img.shields.io/pypi/v/cpknextgen)](https://pypi.org/project/cpknextgen/)
+
+## Process Capability using Normal distribution mixture
+This procedure allows you to compute the $c_p$ and $c_{pk}$ process capability indices, utilizing an innovative method for estimating the center and quantiles of the process, including their uncertainty. The advantage of this approach is its consistency in comparing processes and their changes over time, without relying heavily on anecdotal evidence or having to categorize the “process type”.
+
+## Usage
+Install the library
+```shell
+python -m pip install cpknextgen
+```
+
+Import the `evaluate` function which is the main entry point
+
+```python
+from cpknextgen import evaluate
+``` 
+The function takes in 1-dimensional list/numpy array of process data, the tolerances (as mentioned above) and some other parameters for the Gaussian Mixture. Refer to the docstring for further details. It calculates point and interval estimates of the indices, and graphical results - empirical CDF and sample estimate of CDF.
+
+## Methodology
+The method leverages past performance data and experience. The process of calculation, as outlined in the figure below, can either include or exclude the prior, where the prior is a dataset from past performance used as Bayesian-type information.
+
+![method_illustration](https://github.com/diribet/cpknextgen/assets/1448805/28878d30-307a-40b5-8b04-388eb6f05d96)
+
+The algorithm is designed for continuous process observation, meaning it estimates the resulting indices' value with uncertainty at each point of the process. It can predict what the resulting indices' values will be when the process is complete for a given production period (e.g., a shift, a day, a week, etc.).
+
+### Calculation Without Prior
+Calculation without the prior is equivalent to estimating the indices on the prior, and the resulting information can be used to calculate the indices on another dataset with this prior. It is especially recommended for “closed” production periods, such as calculating the process capability for a recently concluded shift.
+
+The data is often accompanied by varying amounts of contextual information, most notably the tolerance limits and the extreme limits. These extreme limits are dictated by physical restrictions or plausibility limits and are not mandatory. Any data outside these limits are treated as outliers and ignored.
+To calculate $c_{pk}$, at least one tolerance limit is necessary. Both tolerance limits are needed for a proper calculation of c_p. If not provided, the algorithm only estimates the quantiles, giving the process center and width, without a tolerance interval for comparison.
+
+Before distribution estimation, data transformation based on shape takes place. This involves the following steps:
+1. Logarithmic or logit transformations based on extreme limits, when they exist.
+2. Applying a Yeo-Johnson transformation.
+3. Scaling the tolerance interval to a +/-1 interval. In cases where one or both tolerances are missing, they are estimated as "tolerance quantiles" from the data.
+
+### Calculation With Prior (NOT IMPLEMENTED!)
+The data transformation method is derived from the prior. The extent to which the prior is used in distribution estimation varies, depending on the amount of information available at the time of estimation. With limited information, e.g., after the first hour of an 8-hour shift, there is a higher reliance on the past shape of the process from the prior. As the shift progresses, indices will be estimated purely from the information from the ongoing production period.
+
+This balance is controlled by the "Basic sample size" and the "Process Length" parameters. Regardless of the size of the prior, the algorithm ensures the amount of information derived from it corresponds to these two parameters. Hence, it is advisable to use a "sufficiently large" prior dataset that includes all reasonable process variants.
+
+### Special Cases
+There are two types of special cases that limit the calculation. In the first scenario, no calculation proceeds if there's only one data point or if all data points in the set have the same value. In the second scenario, the calculation proceeds, but it does not produce a prior that can be used for another dataset, e.g., when the lower limit/tolerance isn't given, and all data are above the upper tolerance. 
+These special cases are currently under review, and we look forward to sharing updated methodologies to handle them in the future.
+
+### Conclusion
+![prior_illustration](https://github.com/diribet/cpknextgen/assets/1448805/c56a1653-d0bf-42ce-8970-bb714be48e98)
+
+This novel method for computing process capability indices offers a more consistent and data-driven approach. Feedback and contributions are encouraged as we continue to refine and extend this methodology. Please refer to the figure above for a graphical representation of the process.
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,45 @@
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[project]
+name = "cpknextgen"
+version = "1.0.0"
+authors = [
+	{ name = "Stanislav Král", email = "stanislav.kral@diribet.cz" },
+	{ name = "Jaroslav Staněk", email = "jaroslav.stanek@diribet.cz" },
+	{ name = "Helena Paulasová", email = "helena.paulasova@diribet.cz" },
+	{ name = "Vlastimil Dolejš", email = "vlastimil.dolejs@diribet.cz" },
+]
+description = "Cpk NextGen"
+license = { text = "MIT License" }
+keywords = ["Cp", "Cpk", "Process capability"]
+readme = "README.md"
+classifiers = [
+	"Programming Language :: Python :: 3",
+	"License :: OSI Approved :: MIT License",
+	"Operating System :: OS Independent",
+	"Intended Audience :: Manufacturing",
+]
+requires-python = ">=3.9"
+dependencies = [
+	'numpy>=1.21,<1.27',
+	'scipy>=1.10,<1.15',
+	'numba>=0.55.0',
+	'scikit-learn>=1.0'
+]
+
+[project.urls]
+Homepage = "https://github.com/diribet/cpknextgen"
+Issues = "https://github.com/diribet/cpknextgen/issues"
+
+[project.optional-dependencies]
+test = [
+	"pytest>=7",
+	"pytest-sugar>=0.9"
+]
+
+[tool.pytest.ini_options]
+pythonpath = [
+	"src"
+]
diff --git a/requirements-test.txt b/requirements-test.txt
@@ -0,0 +1,44 @@
+#
+# This file is autogenerated by pip-compile with Python 3.11
+# by the following command:
+#
+#    pip-compile --extra=test --output-file=requirements-test.txt pyproject.toml
+#
+colorama==0.4.6
+    # via pytest
+iniconfig==2.0.0
+    # via pytest
+joblib==1.3.1
+    # via scikit-learn
+llvmlite==0.40.1
+    # via numba
+numba==0.57.1
+    # via cpknextgen (pyproject.toml)
+numpy==1.24.4
+    # via
+    #   cpknextgen (pyproject.toml)
+    #   numba
+    #   scikit-learn
+    #   scipy
+packaging==23.1
+    # via
+    #   pytest
+    #   pytest-sugar
+pluggy==1.2.0
+    # via pytest
+pytest==7.4.0
+    # via
+    #   cpknextgen (pyproject.toml)
+    #   pytest-sugar
+pytest-sugar==0.9.7
+    # via cpknextgen (pyproject.toml)
+scikit-learn==1.3.0
+    # via cpknextgen (pyproject.toml)
+scipy==1.11.1
+    # via
+    #   cpknextgen (pyproject.toml)
+    #   scikit-learn
+termcolor==2.3.0
+    # via pytest-sugar
+threadpoolctl==3.2.0
+    # via scikit-learn
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,26 @@
+#
+# This file is autogenerated by pip-compile with Python 3.11
+# by the following command:
+#
+#    pip-compile --output-file=requirements.txt pyproject.toml
+#
+joblib==1.3.1
+    # via scikit-learn
+llvmlite==0.40.1
+    # via numba
+numba==0.57.1
+    # via cpknextgen (pyproject.toml)
+numpy==1.24.4
+    # via
+    #   cpknextgen (pyproject.toml)
+    #   numba
+    #   scikit-learn
+    #   scipy
+scikit-learn==1.3.0
+    # via cpknextgen (pyproject.toml)
+scipy==1.11.1
+    # via
+    #   cpknextgen (pyproject.toml)
+    #   scikit-learn
+threadpoolctl==3.2.0
+    # via scikit-learn
diff --git a/src/cpknextgen/__init__.py b/src/cpknextgen/__init__.py
@@ -0,0 +1 @@
+from .__main__ import evaluate, EvaluationResult
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		from .__main__ import evaluate, EvaluationResult