-
Notifications
You must be signed in to change notification settings - Fork 2
Centrality scores
It is important to notice that our aim is to produce an estimate, here addressed as "score". These scores cannot be seen as an approximated distance of a genomic region from the nucleus center, they instead provide information on the ranking from center to periphery of the regions of interest.
Different approaches can be used to estimate centrality by analyzing GPSeq sequencing data (b-c) and combining the conditions of an experiment (a).
In this section, we note a centrality estimate as , where
provides the estimate type and
provides the way of combining multiple conditions.
As previously stated, a GPSeq experiment is an ensemble of multiple digestion conditions. Thus, a centrality estimate should make use of the data coming from multiple conditions to approximate a region's position in a more accurate way. We envision three different approaches for combining multiple conditions in one estimate:
- The two-points (
) approach takes into account only first
and last
condition.
- The fixed (
) approach compares each condition
with the first
(
).
- The global (
) approach compares each condition
with the previous
(
).
Centrality estimates can be defined based on the probability of restriction or with slightly different definitions.
Reasoning that a central chromosome should have more reads in higher conditions compared to the initial one, we can estimate centrality of the region as the ratio between the probability of restriction of two conditions.
Given the variety of behaviours that can assume at increasing condition, we define the same measure as before on the cumulative probability.
We define the cumulative probability as:
Then:
Another way to define the cumulative probability is to use the ratio of the cumulative restriction counts in a region, across condition, over the sum of the total number of reads in those conditions.
We define the cumulative probability as:
Note that depends on the condition
only when the cutsite domain is considered as separate (see above). Otherwise, it is constant across conditions and can be taken out of the
.
Then:
Centrality estimates can be defined based on how variable the absolute count of restriction event is per unit fo measure (single/grouped-cutsites).
To estimate variance changes, we use the logarithm of its ratio.
In this case, we use the fano factor as a measure of variability.
We focus on the between increasing conditions:
In this case, we use the coefficient of variation as a measure of variability.
We focus on the between increasing conditions:
GPSeqC v2.3.3
is published under the MIT License - Copyright (c) 2017-18 Gabriele Girelli