Skip to content

Commit

Permalink
Merge pull request #56 from statisticsnorway/update-documentation
Browse files Browse the repository at this point in the history
Update documentation
  • Loading branch information
mmwinther authored Sep 5, 2022
2 parents a38f7b7 + f2b3a2d commit 3c472b8
Show file tree
Hide file tree
Showing 5 changed files with 249 additions and 137 deletions.
83 changes: 64 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,40 +4,85 @@

Document datasets in Statistics Norway

## Usage example
## Usage

1. Run `from datadoc import main; main("./path/to/your/dataset")` to run Datadoc on a dataset of your choosing.
1. Complete metadata as you desire
1. Click `Lagre` to save a metadata document together with your dataset

### If the datadoc package is not installed
![DataDoc in use](./doc/change-language-example.gif)

1. Clone this repo to your Jupyter instance (or local machine)
1. Open the `DataDoc.ipynb` Notebook and run the cell to see the example dataset
### From Jupyter

![DataDoc in use](./doc/change-language-example.gif)
1. Open <https://jupyter.dapla-staging.ssb.no> or another Jupyter Lab environment
1. Run `pip install ssb-datadoc` in the terminal
1. Upload a dataset to your Jupyter server (e.g. <https://github.com/statisticsnorway/datadoc/blob/master/klargjorte_data/person_data_v1.parquet>)
1. Run `from datadoc import main; main("./person_data_v1.parquet")` in a notebook
1. Datadoc will open in a new tab

## Contributing

### Prerequisites

- Python >3.8 (3.10 is preferred)
- Poetry, install via `curl -sSL https://install.python-poetry.org | python3 -`

### Dependency Management

Poetry is used for dependency management.
Poetry is used for dependency management. [Poe the Poet](https://github.com/nat-n/poethepoet) is used for running poe tasks within poetry's virtualenv. Upon cloning this project first install necessary dependencies, then run the tests to verify everything is working.

#### Install all dependencies

```shell
poetry install
```

### Add dependencies

#### Main

```shell
poetry add <python package name>
```

#### Dev

```shell
poetry add --group dev <python package name>
```

### Run tests

```shell
poetry run poe test
```

### Run project locally

To install all required dependencies in a virtual environment run `poetry install`. To add a new dependency to the project run `poetry add <package name>`.
To run the project locally:

```shell
poetry run poe datadoc "gs://ssb-staging-dapla-felles-data-delt/datadoc/klargjorte_data/person_data_v1.parquet"
```

### Run project locally in Jupyter

To run the project locally in Jupyter run:

```bash
poetry install
poetry run jupyter lab
```shell
poetry run poe jupyter
```

A Jupyter instance should open in your browser. Once there, open the `*.ipynb` file. Before running it, select the correct interpreter via `Kernel > Change Kernel > datadoc`.
A Jupyter instance should open in your browser. Open and run the cells in the `.ipynb` file to demo datadoc.

### Run tests
### Bump version

```shell
poetry run poe bump-patch-version
```

> :warning: Run this on the default branch
This command will:

1. Increment version strings in files
1. Commit the changes
1. Tag the commit with the new version

1. Install dev dependencies (see [Dependency Management](#dependency-management))
1. Run `poetry shell` to open a shell in the Virtual Environment for the project
1. Run `pytest` in the root of the project
Then just run `git push origin --tags` to push the changes and trigger the release process.
14 changes: 7 additions & 7 deletions datadoc/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,20 +72,20 @@ def main(dataset_path: str = None):
if dataset_path is None:
# Get the supplied command line argument
parser = argparse.ArgumentParser()
parser.add_argument("--dataset-path", help="Specify the path to a dataset")
args = parser.parse_args()
# Use example dataset if nothing specified
dataset = (
args.dataset_path
or f"{os.path.dirname(__file__)}/../klargjorte_data/person_data_v1.parquet"
parser.add_argument(
"dataset_path",
help="Specify the path to a dataset",
nargs="?",
default=f"{os.path.dirname(__file__)}/../klargjorte_data/person_data_v1.parquet",
)
dataset = parser.parse_args().dataset_path
else:
dataset = dataset_path

state.metadata = DataDocMetadata(dataset)

if running_in_notebook():
logging.basicConfig(level=logging.WARNING)
logging.basicConfig(level=logging.INFO)
from jupyter_dash import JupyterDash

JupyterDash.infer_jupyter_proxy_config()
Expand Down
Binary file modified doc/change-language-example.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3c472b8

Please sign in to comment.