Attaining the 2nd Chargaff Rule by Tandem Duplications
- Creators
-
Jain, Siddharth
-
Raviv, Netanel
-
Bruck, Jehoshua
Abstract
Erwin Chargaff in 1950 made an experimental observation that the count of A is equal to the count of T and the count of C is equal to the count of G in DNA. This observation played a crucial rule in the discovery of the double stranded helix structure by Watson and Crick. However, this symmetry was also observed in single stranded DNA. This phenomenon was termed as 2nd Chargaff Rule. This symmetry has been verified experimentally in genomes of several different species not only for mononucleotides but also for reverse complement pairs of larger lengths up to a small error. While the symmetry in double stranded DNA is related to base pairing, and replication mechanisms, the symmetry in a single stranded DNA is still a mystery in its function and source. In this work, we define a sequence generation model based on reverse complement tandem duplications. We show that this model generates sequences that satisfy the 2nd Chargaff Rule even when the duplication lengths are very small when compared to the length of sequences. We also provide estimates on the number of generations that are needed by this model to generate sequences that satisfy 2nd Chargaff Rule. We provide theoretical bounds on the disruption in symmetry for different values of duplication lengths under this model. Moreover, we experimentally compare the disruption in the symmetry incurred by our model with what is observed in human genome data.
Additional Information
This work was supported in part by the NSF Expeditions in Computing Program - The Molecular Programming Project. The work of Netanel Raviv was supported in part by the postdoctoral fellowship of the Center for the Mathematics of Information (CMI), Caltech, and in part by the Lester-Deutsch postdoctoral fellowship.Attached Files
Submitted - etr138.pdf
Files
Name | Size | Download all |
---|---|---|
md5:d90f325b894df41a527eeb75e75ac8ca
|
1.2 MB | Preview Download |
Additional details
- Eprint ID
- 84120
- Resolver ID
- CaltechAUTHORS:20180105-092230028
- NSF
- Center for the Mathematics of Information, Caltech
- Lester-Deutsch postdoctoral fellowship
- Created
-
2018-01-05Created from EPrint's datestamp field
- Updated
-
2021-08-18Created from EPrint's last_modified field
- Caltech groups
- Parallel and Distributed Systems Group
- Series Name
- Parallel and Distributed Systems Group Technical Reports
- Series Volume or Issue Number
- 138
- Other Numbering System Name
- PARADISE
- Other Numbering System Identifier
- ETR-38