-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated pseudocode #2
base: test-branch
Are you sure you want to change the base?
Conversation
@@ -6,8 +6,8 @@ TCGA VCF file Subset into breast, ovarian, colo-rectal: | |||
|
|||
Rows: | |||
If chrom in range of genes in pathway | |||
If pos in range of genes in pathway | |||
Include the row in new file | |||
If pos in range of genes in pathway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be old, but here's where you need to test the chrom and pos together, with an and clause, since teh location ranges you're looking for are defined by the combination of chrom and pos together.
|
||
Loop through columns in tissue type subset (above) | ||
If genotype is not an empty field | ||
Add a row of TCGA-ID (from column), and also Chrom, Pos, Ref, Alt, Genotype Info to the final file | ||
|
||
Final file: TSV file indexed by individual, where each variant has its own row (with columns TCGA-ID, Chrom, Pos, Ref, Alt, Genotype Info) | ||
|
||
Final file: TSV file organized by individual, where each variant has its own row (with columns TCGA-ID, Chrom, Pos, Ref, Alt, Genotype Info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I was confused about earlier, the difference between the TSV file and the final file. If the final file is what tells you which individuals have which variants, then you don't want the TCGA ID and genotype info in the TSV file (above). You'd want the TSV file to be a nonredundant file that lists each of the unique variants in the cohort.
|
||
Final file: TSV file organized by individual, where each variant has its own row (with columns TCGA-ID, Chrom, Pos, Ref, Alt, Genotype Info) | ||
Test to determine success: | ||
Sort the TCGA ids in the final file in alphabetical order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
@@ -35,11 +38,13 @@ VCF File with set of unique variants: | |||
|
|||
Loop through rows in product of above tissue type subset | |||
If genotype information is not blank | |||
If chrom, pos, ref, and alt (combination of them all) are unique | |||
If chrom, pos, ref, and alt (combination of them all) have not been added to the final vcf file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will help you make sure you got all the variants you wanted. You might also think about how you could test that you didn't get any variants you don't want. One way to do this would be to collect variant annotations, such as from OpenCravat, and make sure you don't have any annotations for unexpected genes.
|
||
Test to determine success: | ||
In the tissue type subset, go through line by line and count the number of non-blank genotype fields | ||
Compare this count with the number of lines in the final vcf file of unique variants, ideally counts should be the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
@melissacline