More carefully define two-locus haplotypes in docs#3447
More carefully define two-locus haplotypes in docs#3447apragsdale wants to merge 3 commits intotskit-dev:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3447 +/- ##
=======================================
Coverage 91.66% 91.66%
=======================================
Files 38 38
Lines 32184 32184
Branches 5150 5150
=======================================
Hits 29503 29503
Misses 2348 2348
Partials 333 333
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| {math}`n(a_i,\sim b_j)` is the number that carry the allele {math}`a_i` | ||
| and do not carry the allele {math}`b_j`, and | ||
| {math}`n(\sim a_i, b_j)` is the number that carry the allele {math}`b_j` | ||
| and do not carry the allele {math}`a_i`. We refer to these haplotypes |
There was a problem hiding this comment.
Perhaps say something more concrete here? "We refer to these haplotypes" is not very precise (what haplotypes? "ai ~bj" is not a haplotype). You've set up this notation, now use it?
There was a problem hiding this comment.
And a concrete example would help?
| derived alleles, as we do not compute the summary function for the ancestral | ||
| alleles. For unpolarised statistics, we compute the summary function over all |
There was a problem hiding this comment.
You don't mean "for the ancestral alleles" here, since no-where have you defined what it means to compute it "for" an allele. Don't you mean something like "the ancestral allele is not used as the focal allele", where "focal allele" is something that probably should have been defined earlier?
|
Thanks for this. Good start, needs more preciseness. |
|
Gotta run to some things today, but tried to add some preciseness here. Thanks Peter. |
As pointed out in #3445, there is room for improvement in defining the haplotype counts that are passed to the summary functions in the two-locus statistics methods. This more carefully defines those, following advice from @petrelharp, and clarifies the number of times summary functions are called for a given number of alleles at the left and right loci.