If you have ever run a hereditary-cancer panel, filtered a whole exome down to a candidate gene list, or pulled the top hits out of a differential-expression study, the next question is always the same: what does this list mean?
Usually the answer lives in five or six papers, two consensus-guideline documents, and an OMIM record. You can pull it together in twenty minutes if you know where to look. You can also quietly miss it if you do not — and the model behind ChatGPT or Claude will sound right but will not show you which paper it pulled the lifetime-risk numbers from.
This post walks through one version of that workflow: a five-gene hereditary cancer panel, fed into SPARKIT, with the actual cited report it returned.
The list
MLH1
MSH2
MSH6
PMS2
EPCAM
If you have seen this list before, you already know where this is going. The point of this post is what the report looks like — what an analyst would actually paste into a tumor-board prep doc, or into a sequencing-pipeline output.
The prompt
I have the following gene list from a hereditary cancer panel sequencing run: MLH1, MSH2, MSH6, PMS2, EPCAM. What disease(s) are most strongly associated with germline variants in these genes? Identify the unifying syndrome, summarize the clinical phenotype (cancer types and lifetime risks), the inheritance pattern, the molecular mechanism, and the current consensus surveillance guidelines.
One paragraph, five named genes, six things to identify. Pasted straight into the /research playground.
What came back
Wall clock: 122 seconds. Eight unique sources cited 31 times across the report — GeneReviews / NCBI Bookshelf, the NCCN colorectal high-risk-assessment PDF, the Mortality by age, gene and gender PLSD analysis in ScienceDirect, the Nature Modern Pathology paper on the EPCAM mechanism, two PMC EPCAM-deletion frequency studies, a JNCCN summary, and a Facing Our Risk patient-facing risk synthesis.
Summary
The panel MLH1, MSH2, MSH6, PMS2, and EPCAM defines Lynch syndrome (historically hereditary non-polyposis colorectal cancer, HNPCC), the most common inherited cancer-predisposition syndrome. It is an autosomal dominant disorder caused by germline loss-of-function variants in DNA mismatch repair (MMR) genes — or by 3′-end EPCAM deletions that epigenetically silence the adjacent MSH2 — producing tumors with microsatellite instability (MSI-H) and a markedly increased lifetime risk of colorectal, endometrial, ovarian, urothelial, gastric, small-bowel, hepatobiliary, pancreatic, brain, prostate, and sebaceous-skin cancers. — with inline citation to GeneReviews/NCBI Bookshelf NBK1211.
That paragraph is doing real work — it names the syndrome, the inheritance pattern, the molecular mechanism, the special case (EPCAM), the tumor biomarker (MSI-H), and a near-complete list of associated cancers, with the receipt attached.
Lifetime cancer risks (excerpt)
The report breaks the risk profile out per gene — MLH1 and MSH2 highest, MSH6 intermediate and later-onset, PMS2 and EPCAM lower-penetrance. The cumulative-to-75 risks in the cited PLSD/GeneReviews summary:
| Cancer | MLH1 | MSH2 | MSH6 | PMS2 |
|---|
| Any Lynch cancer | 71–81% | 75–84% | 42–62% | ~34% |
| Colorectal | 44–53% | 42–46% | 12–20% | 3–13% |
| Endometrial | 35% | 41–46% | 41% | 12–13% |
| Ovarian | 11% | 17% | 11% | 3% |
| Urothelial | 3–7% | 9–16% | 1–6% | low |
This is the part that the LLM-from-training-memory cannot do credibly. The numbers vary by source, vary by cohort, and have been actively updated by the Prospective Lynch Syndrome Database in the last few years. Getting them right means pulling the current GeneReviews record and the PLSD paper now, not what the model saw at training time.
Surveillance guideline (excerpt)
Colon. Colonoscopy with polypectomy (full colonoscopy, not sigmoidoscopy, because of right-sided predominance). Start age 20–25 (or 2–5 yr before earliest family CRC), every 1–2 yr; can be deferred to age 30–35 for MSH6/PMS2 carriers in some guidelines.
Endometrium. Patient education re: abnormal bleeding; consider annual transvaginal ultrasound + endometrial biopsy; risk-reducing hysterectomy offered after childbearing (typically by age 40–45). Surveillance from age 30–35; surgery individualized — strongest indication for MLH1/MSH2/MSH6 carriers.
Chemoprevention. Aspirin 600 mg daily reduced CRC incidence ~50% in the CAPP2 randomized trial after ≥2 yr; lower doses (75–300 mg) commonly used. Recommended by NICE/Manchester consensus, "consider" in NCCN.
Cascade testing. Offer germline testing to all first-degree relatives once a familial variant is identified; predictive testing usually from age 18 (earlier if early-onset cancers in family).
Every row above came back with a citation attached (omitted here for readability — the actual report keeps them inline).
The full report — 12,500 characters, 31 citations across the body — also covered the EPCAM-deletion mechanism in detail (3′ read-through into MSH2 with tissue-specific promoter hypermethylation, accounting for ~1–3% of Lynch cases), the Knudson two-hit tumorigenesis model, the distinction from Constitutional Mismatch Repair Deficiency (biallelic, recessive, pediatric), the diagnostic workflow (MMR-IHC + MSI testing, reflex MLH1-methylation/BRAF V600E), and the therapeutic implication (MMR-deficient tumors are checkpoint-inhibitor responsive — pembrolizumab tumor-agnostic, dostarlimab neoadjuvant in rectal cancer).
Why this is the workflow
The geneticist on your team can write this answer in an hour. The point is not that SPARKIT replaces the geneticist — it is that you do not always have a geneticist, and even when you do, you want the surveillance-interval numbers backed by the actual NCCN guideline rather than recalled from memory. SPARKIT gives you the synthesis and the receipts.
The same workflow scales:
- Top GWAS hits from a new locus discovery — what trait/pathway do these converge on?
- Cluster-marker genes from a scRNA-seq atlas — what cell type, what tissue, what known disease association?
- Differentially-expressed genes from your latest bulk RNA-seq — what pathway is enriched, and what does the recent literature say about its role in your condition?
- Recurrently-mutated genes from a COSMIC or cBioPortal pull — what cancer types, what co-mutation patterns, what druggable nodes?
Paste the list, get the syndrome / pathway / mechanism / actionable next step, with citations.
Run yours
Open the playground, paste your own gene list (one per line is fine; SPARKIT will infer the intent), and add one sentence about what you want to know. Two minutes later you have a cited Markdown report you can save into the protocol doc, the grant draft, or the tumor-board prep. No subscription needed to try — the Try-it bundle is $10 for 5 queries. If you have an idea for a workflow that should be a one-liner, tell us.