The Impact of Model Misspecification on Phylogenetic Network Inference

Authors

  • Zhen Cao Department of Computer Science, Rice University
  • Meng Li Department of Statistics, Rice University
  • Huw A Ogilvie Department of Computer Science, Rice University
  • Luay Nakhleh Department of Computer Science, Department of Biosciences, Rice University

DOI:

https://doi.org/10.18061/bssb.v3i1.9553

Keywords:

phylogenetic networks, multispecies coalescent, rate heterogeneity, model misspecification

Abstract

The development of statistical methods to infer species phylogenies with reticulations (species networks) has led to many discoveries of gene flow between distinct species. These methods typically assume only incomplete lineage sorting and introgression. Given that phylogenetic networks can be arbitrarily complex, these methods might compensate for model misspecification by increasing the number of dimensions beyond the true value. Herein, we explore the effect of potential model misspecification, including the negligence of gene tree estimation error (GTEE) and assumption of a single substitution rate for all genomic loci, on the accuracy of phylogenetic network inference using both simulated and biological data. In particular, we assess the accuracy of estimated phylogenetic networks as well as test statistics for determining whether a network is the correct evolutionary history, as opposed to the simpler model that is a tree.

We found that while GTEE negatively impacts the performance of test statistics to determine the “treeness” of the evolutionary history of a data set, running those tests on triplets of taxa and correcting for multiple-testing significantly ameliorates the problem. We also found that accounting for substitution rate heterogeneity improves the reliability of full Bayesian inference methods of phylogenetic networks, whereas summary statistic methods are robust to GTEE and rate heterogeneity, though currently require manual inspection to determine the network complexity.

Downloads

Published

2024-09-30

How to Cite

Cao, Z., Li, M., Ogilvie, H., & Nakhleh, L. (2024). The Impact of Model Misspecification on Phylogenetic Network Inference. Bulletin of the Society of Systematic Biologists, 3(1). https://doi.org/10.18061/bssb.v3i1.9553

Issue

Section

Investigations