John D Kececioglu

John D Kececioglu

Professor, Computer Science
Associate Professor, Applied Mathematics - GIDP
Associate Professor, Genetics - GIDP
Associate Professor, BIO5 Institute
Primary Department
Department Affiliations
Contact
(520) 621-4526

Work Summary

John Kececioglu's research is in applied algorithms, with an emphasis on bioinformatics and computational biology, including: multiple sequence alignment, inverse parametric alignment, sequence assembly, and genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment, Facet, a tool for alignment accuracy estimation, InverseOpt, a library for inverse parametric optimization, Ipa, a tool for inverse sequence alignment, and AlignAlign, a tool for optimally aligning alignments.

Research Interest

John Kececioglu is an Associate Professor in the Department of Computer Science, and the BIO5 Institute. His research is in applied algorithms, especially for areas of bioinformatics and computational biology such as: multiple alignment, inverse alignment, sequence assembly, and genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment; Facet, a tool for alignment accuracy estimation; Ipa, a tool for inverse sequence alignment; AlignAlign, a tool for optimally aligning alignments, and Ninja, a tool for evolutionary tree construction. John is a recipient of a National Science Foundation CAREER Award, serves as an Associate Editor for IEEE/ACM Transactions on Computational Biology and Bioinformatics, and is on the Editorial Board of Algorithms for Molecular Biology. He was Conference Chair for RECOMB 2009, and Program Committee Co-Chair in the area of Sequence Analysis for ISMB 2011 and BCB 2012. John served as Associate Head of the Department of Computer Science during 2012.

Publications

Kececioglu, J., & Kim, E. (2006). Simple and fast inverse alignment. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3909 LNBI, 441-455.

Abstract:

For as long as biologists have been computing alignments of sequences, the question of what values to use for scoring substitutions and gaps has persisted. While some choices for substitution scores are now common, largely due to convention, there is no standard for choosing gap penalties. An objective way to resolve this question is to learn the appropriate values by solving the Inverse String Alignment Problem: given examples of correct alignments, find parameter values that make the examples be optimal-scoring alignments of their strings. We present a new polynomial-time algorithm for Inverse String Alignment that is simple to implement, fast in practice, and for the first time can learn hundreds of parameters simultaneously. The approach is also flexible: minor modifications allow us to solve inverse unique alignment (find parameter values that make the examples be the unique optimal alignments of their strings), and inverse near-optimal alignment (find parameter values that make the example alignments be as close to optimal as possible). Computational results with an implementation for global alignment show that, for the first time, we can find best-possible values for all 212 parameters of the standard protein-sequence scoring-model from hundreds of alignments in a few minutes of computation. © Springer-Verlag Berlin Heidelberg 2006.

Kececioglu, J., & Sankoff, D. (1995). Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica, 13(1-2), 180-210.

Abstract:

Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements, and reverses their order. For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal in O(n 2) time and 0(n) space for n-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in 0(mL(n, n)) time and 0(n 2) space, where m is the size of the branch- and-bound search tree, and L(n, n) is the time to solve a linear program of n variables and n constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming. In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing by k random reversals, we find that the average upper bound on reversal distance estimates k to within one reversal for k1/2n and n100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals for n50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes. © 1995 Springer-Verlag New York Inc.

Deblasio, D. F., Wheeler, T. J., & Kececioglu, J. D. (2012). Estimating the accuracy of multiple alignments and its use in parameter advising. Proceedings of RECOMB 2012 (Springer-Verlag Lecture Notes in Bioinformatics), 7262 LNBI, 45-59.

Abstract:

We develop a novel and general approach to estimating the accuracy of protein multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new problem that we call parameter advising. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. We evaluate this approach by applying it to the task of parameter advising: the problem of choosing alignment scoring parameters from a collection of parameter values to maximize the accuracy of a computed alignment. Our estimator, which we call Facet (for "feature-based accuracy estimator"), yields a parameter advisor that on the hardest benchmarks provides more than a 20% improvement in accuracy over the best default parameter choice, and outperforms the best prior approaches to selecting good alignments for parameter advising. © 2012 Springer-Verlag Berlin Heidelberg.

Taylor, E. W., Bhat, A., Nadimpalli, R. G., Zhang, W., & Kececioglu, J. (1997). HIV-1 encodes a sequence overlapping env gp41 with highly significant similarity to selenium-dependent glutathione peroxidases.. Journal of acquired immune deficiency syndromes and human retrovirology : official publication of the International Retrovirology Association, 15(5), 393-394.
Reinert, K., Lenhof, H. -., Mutzel, P., Mehlhorn, K., & Kececioglu, J. D. (1997). Branch-and-cut algorithm for multiple sequence alignment. Proceedings of thr Annual International Conference on Computational Molecular Biology, RECOMB, 241-250.

Abstract:

Multiple sequence alignment is an important problem in computational biology. We study the Maximum Trace formulation introduced by Kececioglu [Kec91]. We first phrase the problem in terms of forbidden subgraphs, which enables us to express Maximum Trace as an integer linear-programming problem, and then solve the integer linear program using methods from polyhedral combinatorics. The trace polytope is the convex hull of all feasible solutions to the Maximum Trace problem; for the case of two sequences, we give a complete characterization of this polytope. This yields a polynomial-time algorithm for a general version of pairwise sequence alignment that, perhaps suprisingly, does not use dynamic programming; this yields, for instance, a non-dynamic-programming algorithm for sequence comparison under the 0-1 metric, which gives another answer to a long-open question in the area of string algorithms [PW93]. For the multiple-sequence case, we derive several classes of facet-defining inequalities and show that for all but one class, the corresponding separation problem can be solved in polynomial time. This leads to a branch-and-cut algorithm for multiple sequence alignment, and we report on our first computational experience. It appears that a polyhedral approach to multiple sequence alignment can solve instances that are beyond present dynamic-programming approaches.