14 Case 1: Single gene
Case 1: the fundamental query of non-B burden for a single gene and corresponding
Use cases: The goal is this use case is to demonstrate the fundamental query of a single gene for non-B burden analyses.
Example: Non-B DNA motifs affect mutation rate and facilitate genome instability. The BRCA1 gene is one of the genes most commonly affected in hereditary breast and ovarian cancer. The BRCA1 gene is a key DNA-repair protein, and its functional loss leaves some cells highly vulnerable to DNA damage, including damage that triggers cancer. Triple negative/basal-like tumors often accompany BRCA1 gene mutations and are aggressive with a poorer prognosis.
Result: From NBBC, we observe BRCA1 to have the highest burden (burden CPKM = 0.84) from the triplex-forming structures (H-DNA) and STR is the second high burden source (burden CPKM = 0.65) (A-B). H-DNA is a triple helix secondary structure formed by homopurine-homopyrimidine sequences with a minimum length of 12 nucleotides. The G-content and length of DNA can affect the formation of non-B DNA structures, including H-DNA motifs.
To further check the quality of motifs by looking into their composition, we use “motif screen” module to find those with both high %G percent and long motif lengths.
Our cluster analyses of motif features revealed two triplex forming mirror repeat motifs residing on Chromosome 17 with relatively long length and high %G among all forming motifs (C). The two sequences are: “AAAGAGAGAGAGAGAGCAAGAGAGAGAGAGAGAAA
” (BRCA-MR, length=35, G%=40%) and “TGTGTGTGCGCGTGTGCGTGTGTGT
” (BRCA-MR, length=25, G%=48%). The app can also output flank regions of the motif regions. For instance, the full triplex forming region for the first motif will be “TCTTGGGAAAAAAAA—AAAGAGAGAGAGAGAGCAAGAGAGAGAGAGAGAAA—GACACCCCAGTGAAG
” (left: TCTTGGGAAAAAAAA
; right: GACACCCCAGTGAAG
).
