Centrality scores

It is important to notice that our aim is to produce an estimate, here addressed as "score". These scores cannot be seen as an approximated distance of a genomic region from the nucleus center, they instead provide information on the ranking from center to periphery of the regions of interest.

Different approaches can be used to estimate centrality by analyzing GPSeq sequencing data (b-c) and combining the conditions of an experiment (a).

In this section, we note a centrality estimate as Cxy , where provides the estimate type and provides the way of combining multiple conditions.

a) Combining multiple conditions in one estimate

As previously stated, a GPSeq experiment is an ensemble of multiple digestion conditions. Thus, a centrality estimate should make use of the data coming from multiple conditions to approximate a region's position in a more accurate way. We envision three different approaches for combining multiple conditions in one estimate:

The two-points () approach takes into account only first and last condition.
The fixed () approach compares each condition with the first ().
The global () approach compares each condition with the previous ().

b) Probability-based

Centrality estimates can be defined based on the probability of restriction or with slightly different definitions.

b.I) Probability of restriction ()

Reasoning that a central chromosome should have more reads in higher conditions compared to the initial one, we can estimate centrality of the region as the ratio between the probability of restriction of two conditions.

Cp2p

Cpf

Cpg

b.II) Cumulative probability of restriction ()

Given the variety of behaviours that can assume at increasing condition, we define the same measure as before on the cumulative probability.

We define the cumulative probability as:

Then:

Ccr2p

Ccrf

Ccrg

b.III) Ratio of cumulative restriction counts ()

Another way to define the cumulative probability is to use the ratio of the cumulative restriction counts in a region, across condition, over the sum of the total number of reads in those conditions.

We define the cumulative probability as:

Note that depends on the condition D_i only when the cutsite domain is considered as separate (see above). Otherwise, it is constant across conditions and can be taken out of the sum .