Skip to content

Commit

Permalink
JTK as class, new Action for target-branch, minor fixes
Browse files Browse the repository at this point in the history
- add Actions for wrong-target-branch and modified PR template and CONTRIBUTING accordingly
- DeprecationWarning fix for LazyLoader
- refactor JTKTest as class object and deprecate previous functions
- make compare_inter_vs_intra_group fully compatible with CLR/ALR
- periods in get_jtk is now a keyword argument with a default value
  • Loading branch information
Bribak committed Dec 20, 2024
1 parent d2f5d55 commit 87ea2fc
Show file tree
Hide file tree
Showing 9 changed files with 4,346 additions and 4,344 deletions.
8 changes: 8 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# ⚠️ IMPORTANT: Branch Strategy
This repository follows a strict branch strategy:
- `master` branch is ONLY for PyPI release mirroring
- All development PRs MUST target the `dev` branch
- If your PR targets `master`, it will be flagged and you'll be asked to retarget to `dev`

---

## Description of Changes

...
Expand Down
68 changes: 68 additions & 0 deletions .github/workflows/pr_branch_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: PR Branch Check

on:
pull_request_target:
types: [opened, edited, synchronize, reopened]
branches:
- master

jobs:
check-target-branch:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Check target branch
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const pr = context.payload.pull_request;
// Skip if PR is from maintainer
const maintainer = 'Bribak';
if (pr.user.login === maintainer) {
return;
}
// Check if PR is targeting master
if (pr.base.ref === 'master') {
const warning = `⚠️ **Important Notice About Your Pull Request**
Thank you for your contribution! However, I noticed that this PR is targeting the \`master\` branch.
In this repository:
- The \`master\` branch is reserved for PyPI release mirroring only
- All development PRs should target the \`dev\` branch
**Action Required:**
1. Please update your PR to target the \`dev\` branch instead
2. If you created your branch from \`master\`, you may need to:
- Create a new branch from \`dev\`
- Cherry-pick or reapply your changes
- Update your PR or create a new one
For more information, please check our CONTRIBUTING.md guide.
Let us know if you need any help with this process!`;
// Add comment to PR
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
body: warning
});
// Add 'wrong-target-branch' label if it exists
try {
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
labels: ['wrong-target-branch']
});
} catch (error) {
console.log('Label could not be added (might not exist)');
}
}
8,088 changes: 4,039 additions & 4,049 deletions 03_motif.ipynb

Large diffs are not rendered by default.

11 changes: 9 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,13 @@
- Changed `resources.open_text` to `resources.files` to prevent `DeprecationWarning` from `importlib` (0c94995)
- `lectin_specificity` now uses our custom `DataFrameSerializer` and is stored as a .json file rather than a .pkl file, to improve long-term stability across versions (034b6ad)

##### Fixed 🐛
- Fixed DeprecationWarning in all data-loading functions that used `importlib.resources.open_text` or `.content`

#### stats
##### Added ✨
- Added the "random_state" keyword argument to `clr_transformation` to allow users to provide a reproducible RNG seed (b94744e)
- Added the `JTKTest` class object

##### Changed 🔄
- For `replace_outliers_winsorization`, in small datasets, the 5% limit is dynamically changed to include at least one datapoint (23d6456)
Expand All @@ -43,6 +47,7 @@

##### Deprecated ⚠️
- Deprecated `hlm`, `fast_two_sum`, `two_sum`, `expansion_sum`, and `update_cf_for_m_n`, which will all be done in-line instead (e1afe33)
- Deprecated `jtkdist`, `jtkinit`, `jtkstat`, `jtkx`, which will all be done by the new `JTKTest`

##### Fixed 🐛
- Fixed DeprecationWarning in `calculate_permanova_stat` for calling nonzero on 0d arrays (23d6456)
Expand Down Expand Up @@ -133,6 +138,7 @@
##### Changed 🔄
- `get_glycanova` will now raise a ValueError if fewer than three groups are provided in the input data (f76535e)
- Improved console drawing quality controlled by `display_svg_with_matplotlib` and image quality in Excel cells using `plot_glycans_excel` (a64f694)
- The "periods" argument in `get_jtk` is now a keyword argument and has a default value of [12, 24]

##### Fixed 🐛
- Fixed a FutureWarning in `get_lectin_array` by avoiding DataFrame.groupby with axis=1 (f76535e)
Expand All @@ -144,6 +150,7 @@
- Fixed an issue where variance-filtered rows could cause problems in `get_differential_expression` if "monte_carlo = True" (ef3da9c)
- Fixed an issue in `get_differential_expression` if "sets = True" that caused indexing issues under certain conditions (ef3da9c)
- Ensured that "effect_size_variance = True" in `get_differential_expression` always formats variances correctly (ef3da9c)
- Ensured that the combination of "grouped_BH = True", "paired = False", and CLR/ALR in `get_differential_expression` works even when negative values are present

#### regex
##### Fixed 🐛
Expand Down Expand Up @@ -175,12 +182,12 @@
#### evolution
##### Fixed 🐛
- Fixed DeprecationWarning in `distance_from_embeddings` to prevent DataFrameGroupBy.apply from operating on the grouping columns (94646ad)
- Fixed an issue in `distance_from_metric` where networks were indexed incorrectly based on presented DataFrame order
- Fixed an issue in `distance_from_metric` where networks were indexed incorrectly based on presented DataFrame order (d2f5d55)

#### biosynthesis
##### Changed 🔄
- Made sure in `network_alignment` that only nodes that are virtual in all aligned networks stay virtual (918d18f)
- `choose_leaves_to_extend` will now correctly return no leaf node glycan if the target composition cannot be reached from any of the leaf nodes in a network (918d18f)

##### Fixed 🐛
- Fixed an issue in `find_shared_virtuals` in which no shared nodes were found because of graph comparisons
- Fixed an issue in `find_shared_virtuals` in which no shared nodes were found because of graph comparisons (d2f5d55)
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ nbdev_install_hooks

### Did you write a patch that fixes a bug?

* Open a new GitHub pull request with the patch.
* Ensure that your PR includes a test that fails without your patch, and pass with it.
* Open a new GitHub pull request with the patch, based on the current dev branch and targeted to merge into the dev branch.
* Ensure that your PR includes a test that fails without your patch, and passes with it.
* Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable.

## PR submission guidelines
Expand Down
6 changes: 3 additions & 3 deletions glycowork/glycan_data/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,15 @@ def __getattr__(self, name):
if name not in self._datasets:
filename = f"{self.prefix}{name}.csv"
try:
with resources.open_text(self.package + '.' + self.directory, filename) as f:
with resources.files(f"{self.package}.{self.directory}").joinpath(filename).open(encoding = 'utf-8-sig') as f:
self._datasets[name] = pd.read_csv(f)
except FileNotFoundError:
raise AttributeError(f"No dataset named {name} available under {self.directory} with prefix {self.prefix}.")
return self._datasets[name]

def __dir__(self):
files = resources.contents(self.package + '.' + self.directory)
dataset_names = [file[len(self.prefix):-4] for file in files if file.startswith(self.prefix) and file.endswith('.csv')]
files = resources.files(f"{self.package}.{self.directory}").iterdir()
dataset_names = [file.name[len(self.prefix):-4] for file in files if file.name.startswith(self.prefix) and file.name.endswith('.csv')]
return dataset_names


Expand Down
Loading

0 comments on commit 87ea2fc

Please sign in to comment.