Summary
The project revolves around training supervised machine learning models to predict plant alpha diversity using environmental DNA (eDNA) data. With a dataset spanning 325 global locations, the models are trained on various environmental factors alongside eDNA sequences. One model incorporates all available eDNA data, while the other focuses solely on taxonomically identifiable sequences. By comparing the performance of these models, the project aims to elucidate the impact of unidentifiable eDNA on diversity scores and assess the bias in reference databases towards indicator species. This approach represents a novel endeavor to analyze global land-based ecosystems with machine learning, akin to existing methods in aquatic environments, thereby enhancing our understanding of biodiversity dynamics on a broader scale.