JTK as class, new Action for target-branch, minor fixes

- add Actions for wrong-target-branch and modified PR template and CONTRIBUTING accordingly - DeprecationWarning fix for LazyLoader - refactor JTKTest as class object and deprecate previous functions - make compare_inter_vs_intra_group fully compatible with CLR/ALR - periods in get_jtk is now a keyword argument with a default value
Glycocalex · Dec 20, 2024 · 87ea2fc · 87ea2fc
1 parent d2f5d55
commit 87ea2fc
Show file tree

Hide file tree

Showing 9 changed files with 4,346 additions and 4,344 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -1,3 +1,11 @@
+# ⚠️ IMPORTANT: Branch Strategy
+This repository follows a strict branch strategy:
+- `master` branch is ONLY for PyPI release mirroring
+- All development PRs MUST target the `dev` branch
+- If your PR targets `master`, it will be flagged and you'll be asked to retarget to `dev`
+
+---
+
 ## Description of Changes
 
 ...

diff --git a/.github/workflows/pr_branch_check.yml b/.github/workflows/pr_branch_check.yml
@@ -0,0 +1,68 @@
+name: PR Branch Check
+
+on:
+  pull_request_target:
+    types: [opened, edited, synchronize, reopened]
+    branches:
+      - master
+
+jobs:
+  check-target-branch:
+    runs-on: ubuntu-latest
+    permissions:
+      pull-requests: write
+    steps:
+      - name: Check target branch
+        uses: actions/github-script@v7
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const pr = context.payload.pull_request;
+            
+            // Skip if PR is from maintainer
+            const maintainer = 'Bribak';
+            if (pr.user.login === maintainer) {
+              return;
+            }
+            
+            // Check if PR is targeting master
+            if (pr.base.ref === 'master') {
+              const warning = `⚠️ **Important Notice About Your Pull Request**
+            
+            Thank you for your contribution! However, I noticed that this PR is targeting the \`master\` branch.
+            
+            In this repository:
+            - The \`master\` branch is reserved for PyPI release mirroring only
+            - All development PRs should target the \`dev\` branch
+            
+            **Action Required:**
+            1. Please update your PR to target the \`dev\` branch instead
+            2. If you created your branch from \`master\`, you may need to:
+               - Create a new branch from \`dev\`
+               - Cherry-pick or reapply your changes
+               - Update your PR or create a new one
+            
+            For more information, please check our CONTRIBUTING.md guide.
+            
+            Let us know if you need any help with this process!`;
+              
+              // Add comment to PR
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: pr.number,
+                body: warning
+              });
+              
+              // Add 'wrong-target-branch' label if it exists
+              try {
+                await github.rest.issues.addLabels({
+                  owner: context.repo.owner,
+                  repo: context.repo.repo,
+                  issue_number: pr.number,
+                  labels: ['wrong-target-branch']
+                });
+              } catch (error) {
+                console.log('Label could not be added (might not exist)');
+              }
+            }
diff --git a/03_motif.ipynb b/03_motif.ipynb
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -32,9 +32,13 @@
 - Changed `resources.open_text` to `resources.files` to prevent `DeprecationWarning` from `importlib` (0c94995)
 - `lectin_specificity` now uses our custom `DataFrameSerializer` and is stored as a .json file rather than a .pkl file, to improve long-term stability across versions (034b6ad)
 
+##### Fixed 🐛
+- Fixed DeprecationWarning in all data-loading functions that used `importlib.resources.open_text` or `.content`
+
 #### stats
 ##### Added ✨
 - Added the "random_state" keyword argument to `clr_transformation` to allow users to provide a reproducible RNG seed (b94744e)
+- Added the `JTKTest` class object
 
 ##### Changed 🔄
 - For `replace_outliers_winsorization`, in small datasets, the 5% limit is dynamically changed to include at least one datapoint (23d6456)
@@ -43,6 +47,7 @@
 
 ##### Deprecated ⚠️
 - Deprecated `hlm`, `fast_two_sum`, `two_sum`, `expansion_sum`, and `update_cf_for_m_n`, which will all be done in-line instead (e1afe33)
+- Deprecated `jtkdist`, `jtkinit`, `jtkstat`, `jtkx`, which will all be done by the new `JTKTest`
 
 ##### Fixed 🐛
 - Fixed DeprecationWarning in `calculate_permanova_stat` for calling nonzero on 0d arrays (23d6456)
@@ -133,6 +138,7 @@
 ##### Changed 🔄
 - `get_glycanova` will now raise a ValueError if fewer than three groups are provided in the input data (f76535e)
 - Improved console drawing quality controlled by `display_svg_with_matplotlib` and image quality in Excel cells using `plot_glycans_excel` (a64f694)
+- The "periods" argument in `get_jtk` is now a keyword argument and has a default value of [12, 24]
 
 ##### Fixed 🐛
 - Fixed a FutureWarning in `get_lectin_array` by avoiding DataFrame.groupby with axis=1 (f76535e)
@@ -144,6 +150,7 @@
 - Fixed an issue where variance-filtered rows could cause problems in `get_differential_expression` if "monte_carlo = True" (ef3da9c)
 - Fixed an issue in `get_differential_expression` if "sets = True" that caused indexing issues under certain conditions (ef3da9c)
 - Ensured that "effect_size_variance = True" in `get_differential_expression` always formats variances correctly (ef3da9c)
+- Ensured that the combination of "grouped_BH = True", "paired = False", and CLR/ALR in `get_differential_expression` works even when negative values are present
 
 #### regex
 ##### Fixed 🐛
@@ -175,12 +182,12 @@
 #### evolution
 ##### Fixed 🐛
 - Fixed DeprecationWarning in `distance_from_embeddings` to prevent DataFrameGroupBy.apply from operating on the grouping columns (94646ad)
-- Fixed an issue in `distance_from_metric` where networks were indexed incorrectly based on presented DataFrame order
+- Fixed an issue in `distance_from_metric` where networks were indexed incorrectly based on presented DataFrame order (d2f5d55)
 
 #### biosynthesis
 ##### Changed 🔄
 - Made sure in `network_alignment` that only nodes that are virtual in all aligned networks stay virtual (918d18f)
 - `choose_leaves_to_extend` will now correctly return no leaf node glycan if the target composition cannot be reached from any of the leaf nodes in a network (918d18f)
 
 ##### Fixed 🐛
-- Fixed an issue in `find_shared_virtuals` in which no shared nodes were found because of graph comparisons
+- Fixed an issue in `find_shared_virtuals` in which no shared nodes were found because of graph comparisons (d2f5d55)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -15,8 +15,8 @@ nbdev_install_hooks
 
 ### Did you write a patch that fixes a bug?
 
-* Open a new GitHub pull request with the patch.
-* Ensure that your PR includes a test that fails without your patch, and pass with it.
+* Open a new GitHub pull request with the patch, based on the current dev branch and targeted to merge into the dev branch.
+* Ensure that your PR includes a test that fails without your patch, and passes with it.
 * Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable.
 
 ## PR submission guidelines

diff --git a/glycowork/glycan_data/loader.py b/glycowork/glycan_data/loader.py
@@ -50,15 +50,15 @@ def __getattr__(self, name):
     if name not in self._datasets:
       filename = f"{self.prefix}{name}.csv"
       try:
-        with resources.open_text(self.package + '.' + self.directory, filename) as f:
+        with resources.files(f"{self.package}.{self.directory}").joinpath(filename).open(encoding = 'utf-8-sig') as f:
           self._datasets[name] = pd.read_csv(f)
       except FileNotFoundError:
         raise AttributeError(f"No dataset named {name} available under {self.directory} with prefix {self.prefix}.")
     return self._datasets[name]
 
   def __dir__(self):
-    files = resources.contents(self.package + '.' + self.directory)
-    dataset_names = [file[len(self.prefix):-4] for file in files if file.startswith(self.prefix) and file.endswith('.csv')]
+    files = resources.files(f"{self.package}.{self.directory}").iterdir()
+    dataset_names = [file.name[len(self.prefix):-4] for file in files if file.name.startswith(self.prefix) and file.name.endswith('.csv')]
     return dataset_names