Analyze DNA Locally With Open Source Tools

Lightweight, local first options to explore your raw DNA without uploading. A practical starter kit of open source tools and workflows.

Curious about your genetics but do not want to upload raw DNA files to servers you do not control. Below is a concrete, local first workflow that runs entirely on your device using open source tools.

A practical toolkit

  • ripgrep or grep - fast text search for large files
  • awk or csvkit - quick column filtering and format conversion
  • sqlite-utils - turn a text file into a local database for easy queries
  • Python with pandas - simple scoring and small visualizations
  • Any notes app - track questions and findings over time

Pair this toolkit with a browser analysis that runs on your device. It reads the text file, normalizes SNPs, and presents results without leaving your machine. See on device DNA analysis for the idea and browser based DNA analysis for the flow.

Understand the raw file

Most consumer raw DNA exports are text files that look like this:

# rsid  chromosome  position  genotype
rs3094315	1	752566	AA
rs12562034	1	768448	GG

Helpful checks:

head -n 5 my_raw_dna.txt
wc -l my_raw_dna.txt
rg "^rs123" my_raw_dna.txt   # or: grep '^rs123' my_raw_dna.txt

Build a small local trait panel

Create a file panel.tsv that lists rsid, effect_allele, and an optional weight for a few traits you care about. Keep this small and focused so it is transparent.

# rsid	effect_allele	weight	trait
rs9939609	A	1	appetite
rs1800497	A	1	dopamine
rs662799	G	1	lipids

Now filter your raw file to only the rsids in your panel:

awk 'NR==FNR{a[$1]=1; next} !/^#/ && a[$1]' panel.tsv my_raw_dna.txt > subset.txt

Turn the subset into a simple CSV with headers:

awk 'BEGIN{OFS=","; print "rsid,chromosome,position,genotype"} !/^#/{print $1,$2,$3,$4}' subset.txt > subset.csv

Score locally with Python

Create score.py to compute a very simple match count per trait. This is educational only - not a diagnosis.

import csv
from collections import defaultdict

# Load panel
panel = {}
weights = {}
traits = {}
with open('panel.tsv') as f:
    for row in csv.DictReader(f, delimiter='\t'):
        rsid = row['rsid']
        panel[rsid] = row['effect_allele']
        weights[rsid] = float(row.get('weight') or 1)
        traits[rsid] = row.get('trait') or 'misc'

# Load user subset
genos = {}
with open('subset.csv') as f:
    r = csv.DictReader(f)
    for row in r:
        genos[row['rsid']] = row['genotype']

# Score matches where genotype contains the effect allele at least once
trait_scores = defaultdict(float)
for rsid, allele in panel.items():
    g = genos.get(rsid)
    if not g:
        continue
    if allele in g:
        trait_scores[traits[rsid]] += weights[rsid]

for trait, score in sorted(trait_scores.items(), key=lambda x: -x[1]):
    print(f"{trait},{score}")

Run it:

python3 score.py

You will get a small CSV on stdout like trait,score. You can paste this into a spreadsheet or plot it in a quick notebook.

Query locally with SQLite

If you prefer SQL, load the entire raw file into a local database for fast queries:

awk 'BEGIN{OFS=","; print "rsid,chromosome,position,genotype"} !/^#/{print $1,$2,$3,$4}' my_raw_dna.txt > raw.csv
csvs-to-sqlite raw.csv raw_dna.db
sqlite3 raw_dna.db "SELECT * FROM raw WHERE rsid IN (SELECT rsid FROM panel)" > subset_from_sqlite.tsv

This approach scales to larger lists and lets you save reusable queries without uploading anything.

What “local” really means

Local means the computation happens on hardware you control. No background uploads, no remote processing, and no long term storage outside your device. That reduces risk and keeps you in charge.

Limitations to expect

  • Your device performance sets the speed limit
  • Research moves quickly and trait associations evolve
  • Results are educational rather than diagnostic

Local first does not have to be complex. Start small, keep the file private, and use tools that explain what they do. For a guided option that runs on your device, see our on device DNA analysis guide and the browser based dna analysis walkthrough.

Further reading