How To Compare Two DNA Files Privately

Ways to compare two raw DNA files without uploading them to a server. Practical tips for local only comparisons.

Comparing two DNA files can be useful for learning or exploring shared variation. You can do this without uploads by keeping everything on your device.

Approaches

  • Browser comparison that parses both files locally and highlights overlap
  • Command line filters to intersect lists of rsids
  • Small scripts to count matches and differences

Interpreting results

Remember that shared SNPs do not directly express relatedness. Consumer raw files cover common variation. Use the results as educational context rather than a definitive answer.

To keep everything private, see the on device DNA analysis overview and the browser based dna analysis guide. If you need to collaborate, share summaries instead of full raw files.

Step by step - local set comparison

Create two rsid lists from each raw file and compare locally:

awk '!/^#/ {print $1}' person_a.txt | sort -u > a_rsid.txt
awk '!/^#/ {print $1}' person_b.txt | sort -u > b_rsid.txt
comm -12 a_rsid.txt b_rsid.txt > overlap.txt

Count sizes:

wc -l a_rsid.txt b_rsid.txt overlap.txt

Compute a simple Jaccard index in a spreadsheet: overlap divided by total unique rsids. This is a coarse measure of shared positions in the files - not relatedness.

Compare specific variants

If you have a small list of positions of interest, filter both files by those rsids and review the genotypes side by side in a spreadsheet:

awk 'NR==FNR{a[$1]=1; next} !/^#/ && a[$1]' panel.tsv person_a.txt > a_subset.txt
awk 'NR==FNR{a[$1]=1; next} !/^#/ && a[$1]' panel.tsv person_b.txt > b_subset.txt

This keeps the workflow transparent and avoids sending either file to third party servers. Educational use only.

Further reading