7 Burden normalization

Different Normalization types are provided to enable users to perform different comparison across different scope. (Default is to normalize by both gene length and Library size).

The non-b burden is introduced to evaluate and compare the prevalence of non-B structures. It is estimated by counting the number of non-B forming regions of each non-B type in each gene. To compare the non-B burdens across different genes or different non-B types, normalization metrics are applied and provided in NBBC. The various non-B burden metrics included are: raw motif counts (without normalization), normalization by region length, normalization by motif library size, and normalization by both length and library size. The default unit of non-B burden in NBBC is the number of non-B motif counts per kilobase (of query region length) per million (of non-B library size) (CPKM) that is used to normalize both total query region length and motif library size, CPKM.

In specific, region and motif library normalized non-B burden is defined as:

\(\frac{Counts\: of\: nonB\: motifs\: overlapped\: with\: query\: regions\: \times\: 10^{3}\: \times \:10^{6} }{Total\:nonB\:library\:size\: \times \:Total\: query\: region\: length }\)