-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinating matrix DWPC implementation with Scripps folks #58
Comments
Are times EST? If EST I will do my best to join but I am likely to be a few minutes late. Still, the call is long enough that I should hit most of it 😄 |
Ah yes, enter the call via https://zoom.us/j/936884443. The call is scheduled from 1:00-1:50 PM EST, however, I'm free afterwards if anyone wants to have break out discussions. |
I can talk for most of the 1:00-1:50pm window. |
Hi everyone! I've had a chance to look through the discussions and algorithms posted and it looks like we have both come to similar conclusions independently. It seems an exact DWPC solution can be found via matrix multiplication when only one metanode is repeated in a metapath, up to 3 times (are you able to accurately calculate CrCrCrCtD?), or when two metanodes are repeated in either a non-overlapping or entirely-overlapping fashion. The focus of the To this end, things that are quite expensive in terms of computation time, like building the adjacency matrices and weighting by degree, are performed only one time. This way, only the matrix multiplication needs performed to retrieve any one DWPC. This results in a computation time of about 1-2 seconds for the densest matrices, with the majority computing in tens of milliseconds ( Finally, I've devised a method to quickly estimate a solution when the DWPC for a metapath cannot be calculated exactly via matrices. I have tested this estimation method in comparison the features originally extracted by @dhimmel using neo4j, and the results produced are almost identical (I just added a few old test notebooks to our repo so you can take a look). You can also see that using the DWWC for the same metapaths completely over-fits the model, with coefficients for metapaths containing a CtD edge being overvalued. I'm happy to talk and answer any questions you may have about our implementation ahead of our meeting on Wednesday, otherwise we can discuss it at that time. |
Our current implementation will return results, i.e. this wouldn't raise a
We'll set up a chat for @mmayers12 and @kkloste to come to consensus here.
Those are important use cases! We're mostly pursuing unsupervised approaches like given a query set of nodes of the same type, can we detect a metapath-target_node combination that best arrives at the query set. We're also interested in translating node sets to different types, e.g. symptoms to pathways.
@zietzm is building the adjancy matrices and degree normalization expensive in our implementation? I thought they were pretty quick. But yes, one exciting aspect of the matrix approaches is the ability to cache lot's of the intermediate computations.
@mmayers12 we'll make sure to include you in relevant discussions here. Since I think we have a pretty good idea of the next steps regarding matrix DWPC, we'll probably devote most of our wednesday meeting to less certain or more general discussion. |
Yes, the algorithm we implement provably works for any number of repetitions (of a single node), even with other metanodes between, e.g.
Just to make sure we're on the same page --
Our algorithmic framework can handle all three, but currently the only implementation that's finished is for the non-overlapping case. |
This is the same algorithm we've used, however, it is not perfect. Take for instance the example of
Yes, that's exactly what I meant. Through my testing, I have not been able to find an exact solution to the partially-overlapping case. Looking at For example, if you look at the over-counting due to vising So now we have a method that overestimates the error and a method that underestimates it. In order to get a fast approximation for DWPC in these cases, we take an element-wise average of the two errors matrices (sum and element-wise max) to get something that's fairly close to the true DWPC while still calculating the result in under 5 seconds. If you have a way that comes to an exact answer every time, I'd be really excited to see it! Eventually, I would like to be able to look at metapaths longer than 4 steps, however, with that comes more complicated cases of repeated and overlapping metanodes.
In my experience, the larger ones take 10-20 seconds to build. When your multiplication finishes on the ms time-scale, you don't want to have to build them every time you extract DWPC for a new metapath.
Sounds good to me! The rest of the lab is working more on the data side of things, and I know it can be a bit tedious to go over the subtilies matrix operations with those who aren't working on it daily. |
Regarding metapaths that repeat the same node more than three times --- Ach, I didn't see it before in our analysis, thank you for pointing this out to me. I had believed I proved our method would work for any number of repetitions. We had handled other problems, like the
This is exactly the problem we ran into a couple weeks ago. Accounting for loops boils down to the inclusion-exclusion principle: we want to subtract the set of walks with loops in
This approach can generalize to more repeated nodes, but it gets messier. We haven't performed the generalization yet. |
Several individuals in the Su Lab at Scripps Research Institute, specifically @veleritas, @NuriaQueralt and @mmayers12, are working on extensions to Rephetio and Hetionet. We're going to catch up on a call Wednesday July 12, 2017 from 1-2 PM. As an aside, @cgreene @zietzm @danich1 @kkloste, feel free to join this call, even if just as background audio. Since we want to focus the call on more high level planning, @mmayers12 suggested we take a look at his work on implementing matrix DWPC beforehand.
You can see the current progress in the
mmayers12/hetnet-ml
repo and specifically inmatrix_tools.py
. It looks like @mmayers12 has arrived to several of the same decisions and algorithms we've implemented here. For example, the DWWC and DWPC distinction.For reference, the work in hetmech currently consists of three primary contributors:
scipy.sparse
API.It would be great if @mmayers12, @kkloste, @zietzm and I could all get on the same page regarding how the
mmayers12/hetnet-ml
andgreenelab/hetmech
implementation differ. My hope is that @mmayers12 will be able to contribute his advances to this repo. Specifically, we should compare our matrix-DWPC algorithms which are discussed specifically in #52, #47, #45 (merged), and #20 (outdated). Our current implementation is indegree_weight.py
. In short, @kkloste has figured out how to exclude duplicate-node paths for some metapaths but not all.There is also the larger question of whether DWPC is even needed, or whether DWWC is sufficient as comparison to permuted-DWWCs could correct for duplicate-node paths. Finally, Thinklab is shutting down and transitioning to a read-only state 😞, hence the new URL of https://think-lab.github.io.
The text was updated successfully, but these errors were encountered: