Distinguishing Between Histories of Speciation and Introgression Using Genomic Data

Mark Hibbins; Matthew Hahn; Mark Hibbins; Matthew Hahn

doi:10.18061/bssb.v3i1.9227

Investigations

Distinguishing Between Histories of Speciation and Introgression Using Genomic Data

Abstract

Introgression creates complex, non-bifurcating relationships among species. At individual loci and across the genome, both introgression and incomplete lineage sorting interact to produce a wide range of different gene tree topologies. These processes can obscure the history of speciation among lineages, and, as a result, identifying the history of speciation vs. introgression remains a challenge. Here, we use theory and simulation to investigate how introgression can mislead multiple approaches to species tree inference. We find that arbitrarily low amounts of introgression may potentially mislead both gene tree and parsimony approaches to species tree inference if the level of incomplete lineage sorting is sufficiently high. We also show that an alternative approach based on minimum gene tree node heights is inconsistent and depends on the rate of introgression across the genome. To distinguish between speciation and introgression, we apply supervised machine learning models to a set of features that can easily be obtained from phylogenomic datasets. We find that multiple of these models are highly accurate in classifying the species history in simulated datasets. We also show that, if the histories of speciation and introgression can be identified, PhyloNet will return highly accurate estimates of the contribution of each history to the data (i.e. edge weights). Overall, our results highlight the promise of supervised machine learning as a potentially powerful complement to phylogenetic methods in the analysis of introgression from genomic data.

Keywords: introgression, speciation, supervised machine learning

How to Cite:

Hibbins, M., Hahn, M., Hibbins, M. & Hahn, M., (2024) “Distinguishing Between Histories of Speciation and Introgression Using Genomic Data”, Bulletin of the Society of Systematic Biologists 3(1). doi: https://doi.org/10.18061/bssb.v3i1.9227

Rights: Mark S. Hibbins, Matthew W. Hahn

Downloads
Download PDF

Authors

Mark Hibbins
Matthew Hahn
Mark Hibbins (University of Toronto)
Matthew Hahn (Indiana University)

Share

Downloads

Download PDF

Information

Published on 2024-07-11
Peer Reviewed
License Creative Commons Attribution 4.0

Metrics

Views: 981
Downloads: 266

Citation

Download RIS Download BibTeX

File Checksums

(MD5)

PDF: 6e6dd74b37f276dd7644cd6d121cad4e