John D Kececioglu

Professor, Computer Science

Associate Professor, Applied Mathematics - GIDP

Associate Professor, Genetics - GIDP

Associate Professor, BIO5 Institute

Primary Department

Computer Science

Department Affiliations

Computer Science

BIO5 Institute

Contact

(520) 621-4526

kece@cs.arizona.edu

Work Summary

John Kececioglu's research is in applied algorithms, with an emphasis on bioinformatics and computational biology, including: multiple sequence alignment, inverse parametric alignment, sequence assembly, and genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment, Facet, a tool for alignment accuracy estimation, InverseOpt, a library for inverse parametric optimization, Ipa, a tool for inverse sequence alignment, and AlignAlign, a tool for optimally aligning alignments.

Research Interest

John Kececioglu is an Associate Professor in the Department of Computer Science, and the BIO5 Institute. His research is in applied algorithms, especially for areas of bioinformatics and computational biology such as: multiple alignment, inverse alignment, sequence assembly, and genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment; Facet, a tool for alignment accuracy estimation; Ipa, a tool for inverse sequence alignment; AlignAlign, a tool for optimally aligning alignments, and Ninja, a tool for evolutionary tree construction. John is a recipient of a National Science Foundation CAREER Award, serves as an Associate Editor for IEEE/ACM Transactions on Computational Biology and Bioinformatics, and is on the Editorial Board of Algorithms for Molecular Biology. He was Conference Chair for RECOMB 2009, and Program Committee Co-Chair in the area of Sequence Analysis for ISMB 2011 and BCB 2012. John served as Associate Head of the Department of Computer Science during 2012.

Publications

Inverse sequence alignment from partial examples

Kim, E., & Kececioglu, J. (2007). Inverse sequence alignment from partial examples. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4645 LNBI, 359-370.

Abstract:

When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for biological sequences is inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the example alignments score close to optimal. We extend prior work on inverse alignment to partial examples and to an improved model based on minimizing the average error of the examples. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the recovery rate for multiple sequence alignment by up to 25%. © Springer-Verlag Berlin Heidelberg 2007.

Learning parameter-advising sets for multiple sequence alignment

DeBlasio, D. F., & Kececioglu, J. D. (2017). Learning parameter-advising sets for multiple sequence alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(5), 1028-1041.

A tool for multiple sequence alignment

Lipman, D. J., Altschul, S. F., & Kececioglu, J. D. (1989). A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America, 86(12), 4412-4415.

PMID: 2734293;PMCID: PMC287279;Abstract:

Multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence-structure relationships. Until recently, it has been impractical to apply dynamic programming, the most widely accepted method for producing pairwise alignments, to comparisons of more than three sequences. We describe the design and application of a tool for multiple alignment of amino acid sequences that implements a new algorithm that greatly reduces the computational demands of dynamic programming. This tool is able to align in reasonable time as many as eight sequences the length of an average protein.

Computing physical maps of chromosomes with nonoverlapping probes by branch-and-cut

Christof, T., & Kececioglu, J. (1999). Computing physical maps of chromosomes with nonoverlapping probes by branch-and-cut. Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB, 115-123.

Abstract:

We introduce a new combinatorial formulation of chromosome physical-mapping by the sampling-without-replacement protocol. In this protocol, which is simple, inexpensive, and has been used to successfully map several organisms, equal-length clones are hybridized against a subset of the clones called probes, which are designed to form a maximal nonoverlapping clone-subset. The output of the protocol is the clone-probe hybridization matrix H. The problem of finding a maximum-likelihood reconstruction of the order of the probes along the chromosome in the presence of false positive and negative hybridization error is equivalent to finding the minimum number of entries of H to change to zeros so that the resulting matrix has at most 2 ones per row, and the consecutive-ones property across rows. This combinatorial problem, which we call 2-Consecutive-Ones Mapping, has a concise integer linear-programming formulation, to which we apply techniques from polyhedral combinatorics. The formulation is unique in that it does not explicitly represent the probe permutation, and in contrast to prior linear-programming approaches, the number of variables is small: in practice, linear in the number of clones. We derive a large class of facet-defining inequalities for the 2-consecutive-ones polytope that we call the augmented k-degree inequalities, and we show that the basic k-degree class can be efficiently separated using bipartite matchings. Computational results with an implementation of the resulting branch-and-cut algorithm applied to both simulated and real data from the complete genome of Aspergillus nidulans show that we can solve many problems to provable optimality and find maps of higher quality than previously possible.

Automatic design of synthetic gene circuits through mixed integer non-linear programming

Huynh, L., Kececioglu, J., Köppe, M., & Tagkopoulos, I. (2012). Automatic design of synthetic gene circuits through mixed integer non-linear programming. PLoS ONE, 7(4).

PMID: 22536398;PMCID: PMC3334995;Abstract:

Automatic design of synthetic gene circuits poses a significant challenge to synthetic biology, primarily due to the complexity of biological systems, and the lack of rigorous optimization methods that can cope with the combinatorial explosion as the number of biological parts increases. Current optimization methods for synthetic gene design rely on heuristic algorithms that are usually not deterministic, deliver sub-optimal solutions, and provide no guaranties on convergence or error bounds. Here, we introduce an optimization framework for the problem of part selection in synthetic gene circuits that is based on mixed integer non-linear programming (MINLP), which is a deterministic method that finds the globally optimal solution and guarantees convergence in finite time. Given a synthetic gene circuit, a library of characterized parts, and user-defined constraints, our method can find the optimal selection of parts that satisfy the constraints and best approximates the objective function given by the user. We evaluated the proposed method in the design of three synthetic circuits (a toggle switch, a transcriptional cascade, and a band detector), with both experimentally constructed and synthetic promoter libraries. Scalability and robustness analysis shows that the proposed framework scales well with the library size and the solution space. The work described here is a step towards a unifying, realistic framework for the automated design of biological circuits. © 2012 Huynh et al.

John D Kececioglu