Skip to content

Commit

Permalink
initial version of the cpknextgen library
Browse files Browse the repository at this point in the history
Co-authored-by: Stanislav Král <stanislav.kral@diribet.cz>
Co-authored-by: Helena Paulasová <helena.paulasova@diribet.cz>
Co-authored-by: Vlastimil Dolejš <vlastimil.dolejs@diribet.cz>
  • Loading branch information
3 people committed Dec 5, 2023
0 parents commit 4aa598f
Show file tree
Hide file tree
Showing 22 changed files with 20,054 additions and 0 deletions.
59 changes: 59 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Release

on:
push:
tags:
- '**'

jobs:
package:
name: Build dist packages
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4

- name: Setup python
uses: actions/setup-python@v4
with:
python-version: "3.11"

- name: Install build tools
run: python -m pip install --upgrade pip build

- name: Build package
run: python -m build

- name: Create GitHub release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
tag: ${{ github.ref_name }}
run: |
gh release create "$tag" \
--repo="$GITHUB_REPOSITORY" \
--title="${tag#v}"
- name: Store the distribution packages
uses: actions/upload-artifact@v3
with:
name: dist-packages
path: dist/

pypi-publish:
name: Upload packages to PyPI
needs: package
runs-on: ubuntu-latest
environment:
name: release
url: https://pypi.org/p/cpknextgen
permissions:
id-token: write
steps:
- name: Download all the dists packages
uses: actions/download-artifact@v3
with:
name: dist-packages
path: dist/

- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
36 changes: 36 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Test
on:
push:
pull_request:

concurrency:
group: test-${{ github.ref }}
cancel-in-progress: true

jobs:
test:
name: Test with ${{ matrix.py }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
py:
- "3.11"
- "3.10"
- "3.9"
steps:
- name: Checkout source code
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Setup python for test ${{ matrix.py }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.py }}

- name: Install tox
run: python -m pip install tox-gh>=4

- name: Run test suite
run: tox run
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# ignore idea files
**/.idea
**/*.iml

# ignore local python env
env/
venv/

dist/

.tox
48 changes: 48 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Contributing guidelines

This project use [`tox`](https://pypi.org/project/tox/) to simplify virtual environment management and testing.

## Dependencies
Dependencies are defined in `pyproject.toml` using version ranges.
We want to define broad version ranges so the users of this library won't run into dependency conflict.

### Pinned dependencies
Pinned versions of dependencies are defined in `requirements.txt` and `requirements-test.txt`.
These are used for local development and testing on CI.
These files are automatically generated using `pip-compile` TOX environment.

To re-generate requirements files run:
```
tox run -e pip-compile
```

## Running tests
You can run tests for all supported Python versions using:
```
tox run
```

Or for specific Python version:
```
tox run -e py311
```

Or manually invoke pytest:
```
pytest tests
```

## Build
Build tools are defined in `pyproject.toml`.
Backend tool is [`hatchling`](https://hatch.pypa.io/latest/) - responsible for building distribution package.
CLI tool is [`build`](https://pypa-build.readthedocs.io/en/stable/index.html).
The package name specified in .toml config must match the directory name src/[package_name]
if `package_name` contains `__init__.py`.

You can run the build manually:
1. Install build: `python -m pip install build`
2. Run `python -m build` from the project root - this will create wheel in the `/dist` folder.

Or using tox:
1. Run `tox run`
2. Wheel is created in `.tox/.pkg/dist` folder.
19 changes: 19 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2023 Diribet Thesaurus s.r.o.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
OR OTHER DEALINGS IN THE SOFTWARE.
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# C<sub>pk</sub> NextGen
[![Build](https://github.com/diribet/cpknextgen/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/diribet/cpknextgen/actions/workflows/test.yml)
[![PyPI - Version](https://img.shields.io/pypi/v/cpknextgen)](https://pypi.org/project/cpknextgen/)

## Process Capability using Normal distribution mixture
This procedure allows you to compute the $c_p$ and $c_{pk}$ process capability indices, utilizing an innovative method for estimating the center and quantiles of the process, including their uncertainty. The advantage of this approach is its consistency in comparing processes and their changes over time, without relying heavily on anecdotal evidence or having to categorize the “process type”.

## Usage
Install the library
```shell
python -m pip install cpknextgen
```

Import the `evaluate` function which is the main entry point

```python
from cpknextgen import evaluate
```
The function takes in 1-dimensional list/numpy array of process data, the tolerances (as mentioned above) and some other parameters for the Gaussian Mixture. Refer to the docstring for further details. It calculates point and interval estimates of the indices, and graphical results - empirical CDF and sample estimate of CDF.

## Methodology
The method leverages past performance data and experience. The process of calculation, as outlined in the figure below, can either include or exclude the prior, where the prior is a dataset from past performance used as Bayesian-type information.

![method_illustration](https://github.com/diribet/cpknextgen/assets/1448805/28878d30-307a-40b5-8b04-388eb6f05d96)

The algorithm is designed for continuous process observation, meaning it estimates the resulting indices' value with uncertainty at each point of the process. It can predict what the resulting indices' values will be when the process is complete for a given production period (e.g., a shift, a day, a week, etc.).

### Calculation Without Prior
Calculation without the prior is equivalent to estimating the indices on the prior, and the resulting information can be used to calculate the indices on another dataset with this prior. It is especially recommended for “closed” production periods, such as calculating the process capability for a recently concluded shift.

The data is often accompanied by varying amounts of contextual information, most notably the tolerance limits and the extreme limits. These extreme limits are dictated by physical restrictions or plausibility limits and are not mandatory. Any data outside these limits are treated as outliers and ignored.
To calculate $c_{pk}$, at least one tolerance limit is necessary. Both tolerance limits are needed for a proper calculation of c_p. If not provided, the algorithm only estimates the quantiles, giving the process center and width, without a tolerance interval for comparison.

Before distribution estimation, data transformation based on shape takes place. This involves the following steps:
1. Logarithmic or logit transformations based on extreme limits, when they exist.
2. Applying a Yeo-Johnson transformation.
3. Scaling the tolerance interval to a +/-1 interval. In cases where one or both tolerances are missing, they are estimated as "tolerance quantiles" from the data.

### Calculation With Prior (NOT IMPLEMENTED!)
The data transformation method is derived from the prior. The extent to which the prior is used in distribution estimation varies, depending on the amount of information available at the time of estimation. With limited information, e.g., after the first hour of an 8-hour shift, there is a higher reliance on the past shape of the process from the prior. As the shift progresses, indices will be estimated purely from the information from the ongoing production period.

This balance is controlled by the "Basic sample size" and the "Process Length" parameters. Regardless of the size of the prior, the algorithm ensures the amount of information derived from it corresponds to these two parameters. Hence, it is advisable to use a "sufficiently large" prior dataset that includes all reasonable process variants.

### Special Cases
There are two types of special cases that limit the calculation. In the first scenario, no calculation proceeds if there's only one data point or if all data points in the set have the same value. In the second scenario, the calculation proceeds, but it does not produce a prior that can be used for another dataset, e.g., when the lower limit/tolerance isn't given, and all data are above the upper tolerance.
These special cases are currently under review, and we look forward to sharing updated methodologies to handle them in the future.

### Conclusion
![prior_illustration](https://github.com/diribet/cpknextgen/assets/1448805/c56a1653-d0bf-42ce-8970-bb714be48e98)

This novel method for computing process capability indices offers a more consistent and data-driven approach. Feedback and contributions are encouraged as we continue to refine and extend this methodology. Please refer to the figure above for a graphical representation of the process.
45 changes: 45 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "cpknextgen"
version = "1.0.0"
authors = [
{ name = "Stanislav Král", email = "stanislav.kral@diribet.cz" },
{ name = "Jaroslav Staněk", email = "jaroslav.stanek@diribet.cz" },
{ name = "Helena Paulasová", email = "helena.paulasova@diribet.cz" },
{ name = "Vlastimil Dolejš", email = "vlastimil.dolejs@diribet.cz" },
]
description = "Cpk NextGen"
license = { text = "MIT License" }
keywords = ["Cp", "Cpk", "Process capability"]
readme = "README.md"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Intended Audience :: Manufacturing",
]
requires-python = ">=3.9"
dependencies = [
'numpy>=1.21,<1.27',
'scipy>=1.10,<1.15',
'numba>=0.55.0',
'scikit-learn>=1.0'
]

[project.urls]
Homepage = "https://github.com/diribet/cpknextgen"
Issues = "https://github.com/diribet/cpknextgen/issues"

[project.optional-dependencies]
test = [
"pytest>=7",
"pytest-sugar>=0.9"
]

[tool.pytest.ini_options]
pythonpath = [
"src"
]
44 changes: 44 additions & 0 deletions requirements-test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
# pip-compile --extra=test --output-file=requirements-test.txt pyproject.toml
#
colorama==0.4.6
# via pytest
iniconfig==2.0.0
# via pytest
joblib==1.3.1
# via scikit-learn
llvmlite==0.40.1
# via numba
numba==0.57.1
# via cpknextgen (pyproject.toml)
numpy==1.24.4
# via
# cpknextgen (pyproject.toml)
# numba
# scikit-learn
# scipy
packaging==23.1
# via
# pytest
# pytest-sugar
pluggy==1.2.0
# via pytest
pytest==7.4.0
# via
# cpknextgen (pyproject.toml)
# pytest-sugar
pytest-sugar==0.9.7
# via cpknextgen (pyproject.toml)
scikit-learn==1.3.0
# via cpknextgen (pyproject.toml)
scipy==1.11.1
# via
# cpknextgen (pyproject.toml)
# scikit-learn
termcolor==2.3.0
# via pytest-sugar
threadpoolctl==3.2.0
# via scikit-learn
26 changes: 26 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
# pip-compile --output-file=requirements.txt pyproject.toml
#
joblib==1.3.1
# via scikit-learn
llvmlite==0.40.1
# via numba
numba==0.57.1
# via cpknextgen (pyproject.toml)
numpy==1.24.4
# via
# cpknextgen (pyproject.toml)
# numba
# scikit-learn
# scipy
scikit-learn==1.3.0
# via cpknextgen (pyproject.toml)
scipy==1.11.1
# via
# cpknextgen (pyproject.toml)
# scikit-learn
threadpoolctl==3.2.0
# via scikit-learn
1 change: 1 addition & 0 deletions src/cpknextgen/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .__main__ import evaluate, EvaluationResult
Loading

0 comments on commit 4aa598f

Please sign in to comment.