RogueNaRok: identifying rogue taxa in a set of phylogenetic trees

Rogue taxa are a class of taxa with uncertain position in a phylogenetic tree. For inference methods that yield a tree set (bootstrapping, Bayesian tree searches), rogue taxa can assume different positions for each tree. Theoretically, the presence of few rogue taxa in a tree set is sufficient to render the consensus tree of this tree set devoid of any phylogenetic information. Practically, in almost any tree set we can at least slightly improve the sum of branch support values in a consensus tree (by removing rogue taxa).


If RogueNaRok has been useful for your work, please cite as follows:
  • A.J. Aberer, D. Krompass, A. Stamatakis: "Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice", Systematic Biology, in press.

Suggested workflow:


  • Upload a tree set (1 Newick tree per line). Also upload a maximum likelihood estimate (MLE) tree, if you want RogueNaRok to optimize the bipartition support drawn on your MLE tree.
  • Start rogue taxon identification (in many cases default parameters do not need to be modified). Once finished, a listing of taxa is annotated with values indicating how detrimental the specific taxon is to the consensus tree. The annotation can be extended with the result of various runs. This way, you can for instance compare, what taxa are determined to be rogues depending on parameter and algorithm choice (or which rogues also decrease the bipartition support drawn on a MLE tree).
  • Once you have decided on a set of rogue taxa to exclude, you can remove these taxa from your tree set and obtain the consensus tree of your choice (resp. MLE tree with bipartition support). Alternatively, you may want to go through further iterations of step (2), in case taxa were determined to be rogue that are crucial for your analysis or in case you want to continue exploring the parameter space.

Explaination of result visualization:


Available algorithms and their output:


  • RogueNaRok-algorithm: a fast algorithm that iteratively determines how exactly the support of consensus bipartitions changes, if a set of n taxa is removed from the tree set. An annotation of 1.5 means, that the equivalent of 1.5 fully supported bipartitions (e.g., 3 bipartitions with support 50%) is added to the consensus tree, if this taxon and all taxa determined in previous iterations (note the sorting of the listing) are removed from the tree set.
  • Leaf stability index: a statistic for measuring the node stability in a tree set based on quartet frequencies by Thorley and Wilkinson (1999). Values range between 0 (unstable) and 1 (stable).
  • Taxonomic instability index: a statistic for measuring the node stability in a tree set based on unweighted patristic distances by Maddison and Maddison (also implemented in Mesquite). The higher the annotation value, the more unstable is the taxon.

Standalone and further Documentation:


  • For large datasets (e.g. 1000 trees with 1000 taxa) or expensive choices of parameters, it is advisable to download a copy of the RogueNaRok implementation and execute the programs on a local machine or a cluster.
    For the most current version, visit https://github.com/aberer/RogueNaRok.
  • The wiki on our github site provides detailed information on program parameters and a hands-on tutorial:
    software: https://github.com/aberer/RogueNaRok/wiki
  • The RogueNaRok algorithm is explained in detail in this technical report:
  • The standalone version of the code also provides an implementation of the unrooted maximum agreement subtree (U-MAST) that is not part of the webserver version.