Understanding Relationship Probability Ranges in DNA
Ancestry says your match is a "4th cousin." 23andMe says "3rd to 5th cousin." MyHeritage says "3rd cousin." Three companies, three different predictions, for the same match. None of them is wrong. All of them are probability estimates — educated guesses based on shared cM values that could represent multiple different relationships. Understanding why these ranges exist, and what to do with them, is the key to using DNA evidence effectively.
Why Every Prediction Is a Range
Testing companies calculate the shared DNA between two people and then look up that amount in a reference table to predict the relationship. The problem is that DNA sharing is inherently variable. A given amount of shared DNA — say, 200 cM — is consistent with multiple different relationships. It is near the average for second cousins, consistent with first cousin twice removed, possible for half first cousin once removed, and occasionally seen in great-aunt/uncle relationships.
No company can tell you definitively which relationship you have from the cM alone, because the same number appears in all of those relationship categories. The companies handle this differently: some give a single "best guess" relationship, some give a range, and some show probability percentages for each possible relationship. All are trying to communicate the same underlying reality — the cM value is consistent with several possibilities.
The Overlap Problem
The core issue is that relationship ranges overlap significantly. First cousin once removed shares an average of about 425 cM but ranges from about 175 to 760 cM. Half first cousin shares an average of about 425 cM but ranges from about 137 to 856 cM. Second cousin shares an average of about 225 cM but ranges from about 45 to 515 cM. These ranges overlap so substantially that a single cM value can sit squarely in the middle of three different relationship categories simultaneously.
This is not a flaw in the testing technology. It is a fundamental property of how DNA is inherited. The same amount of sharing can result from different relationships because each person received a different random selection from their ancestors. The ranges reflect real biological variation in real families, not measurement error.
How Companies Generate Their Predictions
Each testing company uses its own algorithm to translate cM values into relationship predictions, which is why the same match can receive different labels from different platforms. The algorithms typically assign the relationship that accounts for the largest proportion of people who share that specific cM value — the "most likely" interpretation. But "most likely" can mean anything from 60% probability to 35% probability depending on the cM value, which is why the predictions are often expressed as ranges rather than single answers.
AncestryDNA labels matches with a single relationship term (e.g., "4th cousin") that represents their best estimate. 23andMe gives a range ("3rd to 5th cousin") and sometimes shows confidence levels. MyHeritage uses relationship categories. FamilyTreeDNA shows the cM directly and recommends consulting the Shared cM Project for interpretation. Understanding which approach each company takes helps you interpret their outputs correctly.
When the Range Matters Most
For close relationships — parents, siblings, grandparents, first cousins — the ranges are narrow enough that a single relationship is almost always obvious. A match showing 3,400 cM is a parent or full sibling, full stop. A match showing 1,750 cM is a grandparent, half-sibling, or avuncular relationship. The ranges for these close relationships are tight enough that ambiguity is minimal.
The ambiguity grows with distance. At the second cousin / first cousin once removed level, the overlap is substantial. At the third cousin / second cousin once removed level, it becomes severe. Beyond fourth cousin, the ranges for consecutive relationship categories overlap so extensively that a cM value alone is nearly useless for distinguishing between them. This is where paper research and shared match analysis become essential.
The Probability Approach
The most rigorous way to handle a DNA match is to think in probabilities rather than single answers. For a match showing 350 cM, instead of asking "is this person my second cousin?" ask "what is the probability this person is my second cousin versus first cousin once removed versus half first cousin?" Using the Shared cM Project data, you can estimate what proportion of real people at that cM level fall into each relationship category.
Now apply a prior probability — what do you already know? If the match appears to be roughly your age and in the same geographical area as your second-cousin family line, the second cousin hypothesis gains probability. If the match appears to be ten years older and from a completely different state, the hypothesis changes. The DNA range gives you the starting probabilities. Everything else you know about the match adjusts those probabilities.
The Case Against Over-Precision
One of the most common mistakes in DNA research is treating a relationship label too literally. A company says "4th cousin" and the researcher immediately starts looking for a shared great-great-great-grandparent. But the label is a probability estimate, and the correct relationship might be 3rd cousin once removed, 4th cousin once removed, or half 3rd cousin — all of which are within the plausible range for the same cM value.
Starting with the company label and refusing to consider alternatives is how researchers spin their wheels on a match for months, convinced the connection must be at a specific generational distance when it might be one generation closer or further back. Enter every search with the full range of possibilities open, and let the accumulated evidence — shared matches, paper records, tree overlaps — narrow the range rather than letting the algorithm's label do it for you.
Building a Working Hypothesis
The goal of working with probability ranges is not to achieve certainty before you start looking. It is to build a working hypothesis that is specific enough to guide research. "This match shares 350 cM with me. The most likely relationships are first cousin once removed and second cousin. I have a first cousin once removed whose family I have not fully researched. I will search for overlap between that line and this match's tree." That is a workable hypothesis. You are not claiming the relationship is confirmed — you are saying it is the best candidate given the evidence, and you are going to test it.
When the paper research finds a connection that produces a relationship consistent with the cM value and confirmed by shared matches, the hypothesis becomes a conclusion. That is the standard for DNA evidence in genealogy. Not certainty from one data point, but convergence from multiple independent lines of evidence.

Comments
Post a Comment