On-line Journal of Genetics and Genealogy: 2022

Tuesday, October 04, 2022

MyHeritage has a new way to sort shared DNA matches

We’re happy to announce the addition of sorting abilities for Shared DNA Matches. It’s one of several new improvements we’re making to DNA Matches on MyHeritage in the coming weeks.

Shared DNA Matches are a valuable tool for users interested in figuring out how they’re related to a specific DNA match. The new sorting functionality enables you to sort your Shared DNA Matches based on the proximity of their relationship to you or to the DNA Match you’re reviewing, and gain new insights.

Sorting of Shared DNA Matches is unique to MyHeritage, and this new addition has already received praise from experts in the genealogy community. Diahan Southard of Your DNA Guide says, “SWEET!! This is one of my requested features and will make a big difference”, and Janna Helshtein from DNA at Eye Level says, “This is an amazing feature, I LOVE LOVE LOVE it!” See this link: New sorting for shared DNA matches

Wednesday, August 31, 2022

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

This article reviews Principal Component Analysis (PCA), as applied to populaation genetics, Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. It also suggests other methods that may provide more valid results.

Abstract

Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.

Moving beyond PCA

As an alternative to PCA, we briefly note the advantages of a supervised machine-like model implemented in tools like the Geographic Population Structure (GPS)85 and Pairwise Matcher (PaM)57. In this model, gene pools are simulated from a collection of geographically localized populations. The ancestry of the tested individuals is next estimated in relation to these gene pools. In this model, all individuals are represented as the proportion of gene pools. Their results do not change when samples are added or removed in the second part of the analysis. Population groups are bounded within the gene pools, and inclusion in these groups can be evaluated. This model was shown to be reliable, replicable, and accurate for many of the applications discussed here, including biogeography85, population structure modeling106, ancestry inference107, paleogenomic modeling108, forensics86, and cohort matching57. An evaluation of other tools that may be useful to infer the population structure and their limitations can be found elsewhere37,109.