Skip to content

Commit

Permalink
Added LaText to articles, checked package compilation
Browse files Browse the repository at this point in the history
  • Loading branch information
kfarleigh committed Mar 13, 2024
1 parent e31d39c commit 470d83f
Show file tree
Hide file tree
Showing 5 changed files with 168 additions and 12 deletions.
2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ articles:
PopGenHelpR_createQmatrix: PopGenHelpR_createQmatrix.html
PopGenHelpR_heterozygosity: PopGenHelpR_heterozygosity.html
PopGenHelpR_vignette: PopGenHelpR_vignette.html
last_built: 2024-02-26T21:07Z
last_built: 2024-03-13T16:48Z
urls:
reference: https://kfarleigh.github.io/PopGenHelpR/reference
article: https://kfarleigh.github.io/PopGenHelpR/articles
Expand Down
2 changes: 1 addition & 1 deletion docs/search.json

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
<url>
<loc>https://kfarleigh.github.io/PopGenHelpR/404.html</loc>
</url>
<url>
<loc>https://kfarleigh.github.io/PopGenHelpR/articles/articles/PopGenHelpR_benchmarking.html</loc>
</url>
<url>
<loc>https://kfarleigh.github.io/PopGenHelpR/articles/articles/PopGenHelpR_createQmatrix.html</loc>
</url>
<url>
<loc>https://kfarleigh.github.io/PopGenHelpR/articles/articles/PopGenHelpR_heterozygosity.html</loc>
</url>
<url>
<loc>https://kfarleigh.github.io/PopGenHelpR/articles/index.html</loc>
</url>
Expand Down
43 changes: 43 additions & 0 deletions vignettes/articles/PopGenHelpR_createQmatrix.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,49 @@ Qmat <- Q(sNMFobject, K = K, run = 1)

All you need to do now is append the sample names to the q-matrix as the first column (you can do this with `cbind` or in any text editor). Then you can use it in `PopGenHelpR`. Note that you must be careful that your order in the q-matrix is the same as the order of the samples you are appending.

#### Example of formatting the q-matrix for `PopGenHelpR`

Here we show you how to format a q-matrix generated with the `Q` function from LEA for use in `PopGenHelpR`.

First, we will create a matrix that we may expect from LEA. We also need to create fake sample names **Please note that this is only a toy example and is not real data.**

```{r Qmat gen, echo=TRUE, eval=TRUE}
# Create fake matrix
Qmat <- t(matrix(data = c(0.25, 0.4, 0.35), nrow = 3, ncol = 3))
Fake_inds <- c("FS_1", "FS_2", "FS_3")
```

Cool! We have data, so can we use it in `PopGenHelpR`? No, because `Ancestry_barchart` and `Piechart_map` need a data.frame or CSV; these functions also need the first column to be the individual names. `PopGenHelpR` uses the individual names as a key to link the q-matrix data with populations and coordinates.

Let's add the individual names!

```{r Add inds, echo=TRUE, eval=TRUE}
# Add the names
Qmat_wnames <- cbind(Fake_inds, Qmat)
```

So can we use this `Qmat_wnames` now? No, because `Qmat_wnames` is still a matrix and let's see what `cbind` did to our numeric data. Notice that `cbind` make everything a character, we need the cluster contributions (columns 2 through 4 here) to be numeric. We will fix this using the `sapply` function.

```{r Format qmat, echo=TRUE, eval=TRUE}
# Check the structure of the Qmat_wnames
str(Qmat_wnames)
Qmat_df <- as.data.frame(Qmat_wnames)
Qmat_df[2:4] <- sapply(Qmat_df[2:4], as.numeric)
# Check again
str(Qmat_df)
```


Notice that our cluster contribution columns are now numeric and that our `Qmat_df` object is a data.frame.


Now we can use it in `PopGenHelpR` with a population assignment file/data.frame to generate figures.


### ADMIXTURE

ADMIXTURE is a little more complex because it is not associated with an R package, but it is nice because it gives us the q-matrix automatically. See [this tutorial](https://speciationgenomics.github.io/ADMIXTURE/) for more details.
Expand Down
124 changes: 114 additions & 10 deletions vignettes/articles/PopGenHelpR_heterozygosity.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,30 @@ All_Het <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, stat

### Expected heterozygosity (*H~e~*)

`PopGenHelpR` estimates *H~e~* per locus and population following the equations provided by the Hardy-Weinberg equation. Briefly, the equation estimates *H~e~* as one minus the squared frequency of each allele (p^2 and q^2, respectively), thus giving us the expected frequency of heterozygous genotypes (2pq) at a locus. The overall measure of *H~e~* is calculated as the average of the per locus estimates.
`PopGenHelpR` estimates *H~e~* per locus and population following the equations provided by the Hardy-Weinberg equation. Briefly, the equation estimates *H~e~* as one minus the squared frequency of each allele ($p^2$ and $q^2$, respectively), thus giving us the expected frequency of heterozygous genotypes (2pq) at a locus. The overall measure of *H~e~* is calculated as the average of the per locus estimates.

We use *H~e~* as a null model to test against and determine if Hardy-Weinberg equilibrium is being violated.
The equation per locus is below, where *p* is the reference allele and *q* is the alternate allele:


$$
H_e = 1-p^2-q^2
$$


Thus, the equation to calculate the overall *H~e~* is below, where *K* is the number of SNPs.


$$
H_e = \frac{\sum_{k=1}^K(1-p^2-q^2)}{K}
$$


#### How do we use *H~e~*

We use *H~e~* as a null model to test against and determine if Hardy-Weinberg equilibrium is being violated. Violations could indicate mutation, non-random mating, gene flow, non-infinite population size, natural selection, or any combination.


#### How do we calculate *H~e~* in `PopGenHelpR`?

You can calculate *H~e~* in `PopGenHelpR` using the command below.

Expand All @@ -68,8 +89,37 @@ He <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic

`PopGenHelpR` estimates *H~o~* per locus and population following the equations of Nei (1987). Briefly, the equations estimate *H~o~* as one minus the proportion of homozygotes in the population at each locus, thus giving us the proportion of heterozygotes at a locus. The overall measure of *H~o~* is calculated as the average of the per locus estimates.

The equation per locus is below:


$$
H_o = 1- \frac{Number\; of\; homoyzgotes}{Number\; of\; samples}
$$


Thus the overall measure of *H~o~* is below, where K is the number of SNPs:


$$
H_o = \frac{\sum_{k = 1}^K{1- \frac{Number\; of\; homoyzgotes}{Number\; of\; samples}}}{K}
$$


The formal equation of *H~o~* from Nei (1987) is below: *Pkii* is the proportion of homozygote (*i*) in a sample (*k*), and *np* is the number of samples:


$$
H_o = 1-\sum_{k}\sum_{i}\frac{Pkii}{np}
$$


#### How do we use *H~o~*

We use *H~o~* as a measure of genetic diversity and also to compare to *H~e~* to determine if our data is exhibiting different patterns, such as inbreeding (*H~o~* < *H~e~*) or heterozygote advantage (*H~o~* > *H~e~*).


#### How do we calculate *H~o~* in `PopGenHelpR`?

You can calculate *H~o~* in `PopGenHelpR` using the command below.

```{r Obs Het, echo=TRUE, eval=FALSE}
Expand All @@ -82,9 +132,19 @@ Ho <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic

### Proportion of heterozygous loci (*PHt*)

The proportion of heterozygous loci (*PHt*) is calculated as the number of heterozygous loci divided by the number of genotyped loci in each individual.
The proportion of heterozygous loci (*PHt*) is calculated as the number of heterozygous SNPs divided by the number of genotyped SNPs in each individual.


*PHt* is useful to evaluate the diversity within each individual.
$$
PHt = \frac{Number\; of\; heterozygous\; SNPs}{Number\; of\; genotyped\; SNPs}
$$

#### How do we use *PHt*

*PHt* is helpful for evaluating the diversity within each individual and comparing it to other samples. Individual heterozygosity is also commonly used to investigate inbreeding (Miller et al., 2014). Individual heterozygosity is used in heterozygosity-fitness correlations (HFC), assuming that heterozygosity positively correlates with fitness. Thus, increased heterozygosity (decreased inbreeding) indicates higher fitness.


#### How do we calculate *PHt* in `PopGenHelpR`?

You can calculate *PHt* in `PopGenHelpR` using the command below.

Expand All @@ -94,21 +154,43 @@ PHt <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statisti

### Proportion of heterozygous loci standardized by the average expected heterozygosity (*Hs~exp~*)

The proportion of heterozygous loci standardized by the average expected heterozygosity (*Hs~exp~*) is calculated as *PHt* divided by the mean expected heterozygosity (*H~e~*) for each individual.
The proportion of heterozygous loci standardized by the average expected heterozygosity (*Hs~exp~*) is calculated as *PHt* divided by the mean expected heterozygosity (*H~e~*) for each individual. Please see the equation below.


$$
Hs_{exp} = \frac{PHt}{H_e}
$$


*Hs~exp~* was introduced by Coltman et al. (1999) to evaluate individual heterozygosity across individuals who were genotyped with different markers; this allows us to compare individual heterozygosity on the same scale.
#### How do we use *Hs~exp~*

*Hs~exp~* was introduced by Coltman et al. (1999) to evaluate individual heterozygosity across individuals who were genotyped with different markers; this allows us to compare individual heterozygosity on the same scale and to assess inbreeding. Like *PHt*, higher *Hs~exp~* indicates less inbreeding.

#### How do we calculate *Hs~exp~* in `PopGenHelpR`?

You can calculate *Hs~exp~* in `PopGenHelpR` using the command below.


```{r Hsexp, echo=TRUE, eval=FALSE}
Hs_exp <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Hs_exp")
```


### Proportion of heterozygous loci standardized by the average observed heterozygosity (*Hs~obs~*)

The proportion of heterozygous loci standardized by the average observed heterozygosity (*Hs~obs~*) is calculated as *PHt* divided by the mean observed heterozygosity (*H~o~*) for each individual.
The proportion of heterozygous loci standardized by the average observed heterozygosity (*Hs~obs~*) is calculated as *PHt* divided by the mean observed heterozygosity (*H~o~*) for each individual. Please see the equation below.


$$
Hs_{obs} = \frac{PHt}{H_o}
$$


*Hs~obs~* was introduced by Coltman et al. (1999) to evaluate individual heterozygosity across individuals who were genotyped with different markers; this allows us to compare individual heterozygosity on the same scale.
#### How do we use *Hs~obs~*

*Hs~obs~* was introduced by Coltman et al. (1999) to evaluate individual heterozygosity across individuals who were genotyped with different markers; this allows us to compare individual heterozygosity on the same scale and to assess inbreeding. Like *PHt*, higher *Hs~obs~* indicates less inbreeding.

#### How do we calculate *Hs~obs~* in `PopGenHelpR`?

You can calculate *Hs~obs~* in `PopGenHelpR` using the command below.

Expand All @@ -119,10 +201,20 @@ Hs_obs <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, stati

### Internal relatedness (*IR*)

Internal relatedness (*IR*) is calculated as two times the number of homozygous loci minus the sum of the frequency of the ith allele, divided by two times the number of loci minus the sum of the frequency of the ith allele (see equation 2.1 in Amos et al., 2001)
The equation for Internal relatedness (*IR*) is more complex and qutie the mouthful(or sentence full?). Please see the equation below. *IR* is calculated as two times the number of homozygous loci minus the sum of the frequency of the ith allele divided by two times the number of loci minus the sum of the frequency of the ith allele (see equation 2.1 in Amos et al., 2001).


$$
IR = \frac{(2H-\sum{f_i})}{(2N-\sum{f_i})}
$$


#### How do we use *IR*?

*IR* was developed by Amos et al. (2001) to measure the diversity within individuals (Amos et al., 2001). Negative *IR* values suggest that individuals are outbred (tend to be more heterozygous), while positive values indicate that individuals are inbred (tend to be more homozygous).

#### How do we calculate *IR* in `PopGenHelpR`?

You can calculate *IR* in `PopGenHelpR` using the command below.

```{r IR, echo=TRUE, eval=FALSE}
Expand All @@ -131,10 +223,20 @@ IR <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic

### Homozygosity by locus (*HL*)

Homozygosity by locus (*HL*) is calculated as the expected heterozygosity of loci in homozygosis divided by the sum of the expected heterozygosity of loci in homozygosis and the expected heterozygosity of loci in heterozygosis (see Aparicio et al., 2006).
Homozygosity by locus (*HL*) is calculated as the expected heterozygosity of loci in homozygosis ($E_h$) divided by the sum of the expected heterozygosity of loci in homozygosis ($E_h$) and the expected heterozygosity of loci in heterozygosis ($E_j$; see Aparicio et al., 2006). Please see the equation below.


$$
HL = \frac{\sum{E_h}}{\sum{E_h} + \sum{E_j}}
$$


#### How do we use *HL*?

*HL* was proposed by Aparicio et al. (2006) to improve on *IR* by weighing the contribution of each locus to the index depending on their allelic variability (Aparicio et al., 2006). *HL*, like *IR*, is useful for evaluating the diversity within an individual. *HL* ranges from 0 when all loci are heterozygous and 1 when all loci are homozygous (Aparicio et al., 2006).

#### How do we calculate *HL* in `PopGenHelpR`?

You can calculate *HL* in `PopGenHelpR` using the command below.

```{r HL, echo=TRUE, eval=FALSE}
Expand All @@ -150,6 +252,8 @@ Amos W., Worthington Wilmer J., Fullard K., Burg T. M., Croxall J. P., Bloch D.,
Aparicio J. M., Ortego J., Cordero P. J. 2006. What should we weigh to estimate heterozygosity, alleles or loci? Molecular Ecology. 15: 4659-4665

Coltman D. W., Pilkington J. G., Smith J. A., Pemberton J. M. 1999. Parasite-mediated selection against inbred Soay sheep in a free-living, island population. Evolution. 53: 1259-1267.

Miller, J. M., Malenfant, R. M., David, P., Davis, C. S., Poissant, J., Hogg, J. T., ... & Coltman, D. (2014). Estimating genome-wide heterozygosity: effects of demographic history and marker type. Heredity, 112(3), 240-247.

Nei, M. (1987). Molecular evolutionary genetics. Columbia university press.

Expand Down

0 comments on commit 470d83f

Please sign in to comment.