Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Centrality scores

Gabriele Girelli edited this page Oct 29, 2018 · 2 revisions

It is important to notice that our aim is to produce an estimate, here addressed as "score". These scores cannot be seen as an approximated distance of a genomic region from the nucleus center, they instead provide information on the ranking from center to periphery of the regions of interest.

Different approaches can be used to estimate centrality by analyzing GPSeq sequencing data (b-c) and combining the conditions of an experiment (a).

In this section, we note a centrality estimate as Cxy, where x provides the estimate type and y provides the way of combining multiple conditions.

a) Combining multiple conditions in one estimate

As previously stated, a GPSeq experiment is an ensemble of multiple digestion conditions. Thus, a centrality estimate should make use of the data coming from multiple conditions to approximate a region's position in a more accurate way. We envision three different approaches for combining multiple conditions in one estimate:

  1. The two-points (2p) approach takes into account only first D_1 and last D_n condition.
  2. The fixed (f) approach compares each condition D_i with the first D_1 (irange).
  3. The global (g) approach compares each condition D_i with the previous D_i-1 (irange2).

b) Probability-based

Centrality estimates can be defined based on the probability of restriction or with slightly different definitions.

b.I) Probability of restriction (p)

Reasoning that a central chromosome should have more reads in higher conditions compared to the initial one, we can estimate centrality of the region w as the ratio between the probability of restriction of two conditions.

Cp2p

Cpf

Cpg

b.II) Cumulative probability of restriction (cr)

Given the variety of behaviours that NR can assume at increasing condition, we define the same measure as before on the cumulative probability.

We define the cumulative probability as:

PC

Then:

Ccr2p

Ccrf

Ccrg

b.III) Ratio of cumulative restriction counts (rc)

Another way to define the cumulative probability is to use the ratio of the cumulative restriction counts in a region, across condition, over the sum of the total number of reads in those conditions.

We define the cumulative probability as:

Pr

Note that Ns depends on the condition D_i only when the cutsite domain is considered as separate (see above). Otherwise, it is constant across conditions and can be taken out of the sum.

Then:

Crc2p

Crcf

Crcg

c) Variability-based

Centrality estimates can be defined based on how variable the absolute count of restriction event is per unit fo measure (single/grouped-cutsites).

c.I) Variance (v)

To estimate variance changes, we use the logarithm of its ratio.

Cv2p

Cvf

c.II) Fano factor (ff)

In this case, we use the fano factor F as a measure of variability.

Fdef

We focus on the DF between increasing conditions:

Cff2p

Cfff

c.III) Coefficient of variation (cv)

In this case, we use the coefficient of variation c_v as a measure of variability.

cvdef

We focus on the Dcv between increasing conditions:

Ccv2p

Ccvf