Distinguishing Between Histories of Speciation and Introgression Using Genomic Data
DOI:
https://doi.org/10.18061/bssb.v3i1.9227Keywords:
introgression, speciation, supervised machine learningAbstract
Introgression creates complex, non-bifurcating relationships among species. At individual loci and across the genome, both introgression and incomplete lineage sorting interact to produce a wide range of different gene tree topologies. These processes can obscure the history of speciation among lineages, and, as a result, identifying the history of speciation vs. introgression remains a challenge. Here, we use theory and simulation to investigate how introgression can mislead multiple approaches to species tree inference. We find that arbitrarily low amounts of introgression may potentially mislead both gene tree and parsimony approaches to species tree inference if the level of incomplete lineage sorting is sufficiently high. We also show that an alternative approach based on minimum gene tree node heights is inconsistent and depends on the rate of introgression across the genome. To distinguish between speciation and introgression, we apply supervised machine learning models to a set of features that can easily be obtained from phylogenomic datasets. We find that multiple of these models are highly accurate in classifying the species history in simulated datasets. We also show that, if the histories of speciation and introgression can be identified, PhyloNet will return highly accurate estimates of the contribution of each history to the data (i.e. edge weights). Overall, our results highlight the promise of supervised machine learning as a potentially powerful complement to phylogenetic methods in the analysis of introgression from genomic data.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Mark S. Hibbins, Matthew W. Hahn
This work is licensed under a Creative Commons Attribution 4.0 International License.