Nucleic Acid Sequence Design via Efficient Ensemble Defect Optimization

Abstract

We describe an algorithm for designing the sequence of one or more interacting nucleic acid strands intended to adopt a target secondary structure at equilibrium. Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of unpseudoknotted secondary structures. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, candidate mutations are evaluated on the leaf nodes of a tree-decomposition of the target structure. During leaf optimization, defect-weighted mutation sampling is used to select each candidate mutation position with probability proportional to its contribution to the ensemble defect of the leaf. As subsequences are merged moving up the tree, emergent structural defects resulting from crosstalk between sibling sequences are eliminated via reoptimization within the defective subtree starting from new random subsequences. Using a Θ(N^3) dynamic program to evaluate the ensemble defect of a target structure with N nucleotides, this hierarchical approach implies an asymptotic optimality bound on design time: for sufficiently large N, the cost of sequence design is bounded below by 4/3 the cost of a single evaluation of the ensemble defect for the full sequence. Hence, the design algorithm has time complexity Ω(N^3). For target structures containing N ∈{100,200,400,800,1600,3200} nucleotides and duplex stems ranging from 1 to 30 base pairs, RNA sequence designs at 37°C typically succeed in satisfying a stop condition with ensemble defect less than N/100. Empirically, the sequence design algorithm exhibits asymptotic optimality and the exponent in the time complexity bound is sharp

Additional Information

© 2010 Wiley Periodicals, Inc. Received 4 April 2010; Revised 3 June 2010; Accepted 23 June 2010. Published online 17 August 2010. Contract/grant sponsor: The National Science Foundation; contract/grant numbers: NSF-CCF-0832824 (The Molecular Programming Project), NSFCCF-CAREER-0448835. Contract/grant sponsor: The Beckman Institute at Caltech. Contract/grant sponsor: The Ralph M. Parsons Foundation. The authors thank R.M. Dirks and L.B. Pierce for helpful discussions.

Attached Files

Supplemental Material - JCC_21633_sm_complexes_new.txt

Supplemental Material - JCC_21633_sm_engineered_new.txt

Supplemental Material - JCC_21633_sm_random_new.txt

Supplemental Material - JCC_21633_sm_suppinfo.pdf

Files

JCC_21633_sm_random_new.txt

Files (886.7 kB)

Name	Size	Download all
JCC_21633_sm_random_new.txt md5:2a0f823aeeeaaf53ef4804d96262f234	190.4 kB	Preview Download
JCC_21633_sm_engineered_new.txt md5:adbe258d6bf96858909290b41696fb4c	190.4 kB	Preview Download
JCC_21633_sm_suppinfo.pdf md5:27af471b49ad23f570acf02f9a29e55d	315.1 kB	Preview Download
JCC_21633_sm_complexes_new.txt md5:dbd0352c5bd5a7e8b3ba958d54db6034	190.8 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Nucleic Acid Sequence Design via Efficient Ensemble Defect Optimization

Abstract

Additional Information

Attached Files

Files

Additional details