SPARKIT
← Back to blog

From CRISPR screen to triage: twenty candidate dependencies, one SPARKIT call

It's 4pm Wednesday. A computational biologist at a Series B biotech just got the read-out from a CRISPR screen — twenty candidate hits cleared the threshold, validated against the wider DepMap dataset. By Friday morning's R&D meeting, leadership wants a triage. Which of these are worth the next twelve months of follow-up? Which map cleanly to a known mechanism? Which still represents an open commercial opportunity, and which is already in someone else's pipeline?

Doing this well means, per gene: pulling the primary literature on the dependency biology, scanning prior chemistry for named compounds, checking which programs are at clinic vs. preclinical vs. unbroken. At normal human cadence, that's about 25 hours per gene. Twenty genes is 500 person-hours — roughly 12 weeks of one researcher's calendar, for the triage step alone, before any analysis of the dataset begins. The clock says forty-eight hours.

Below is what a comp-bio scientist's actual workflow looks like with one variable changed: instead of skimming abstracts and triaging by gut feel, they sent all twenty genes to SPARKIT in a single query.

The twenty hits

The screen flagged twenty real, well-validated DepMap selective dependencies, spanning the maturity spectrum from approved drugs to fully preclinical:

GeneSelective dependency context
WRNMicrosatellite-instability-high cancers
MDM2TP53-wildtype, especially MDM2-amplified
EZH2SWI/SNF-mutant cancers, EZH2-mutant lymphomas
PRMT5MTAP-deleted cancers (~10–15% of solid tumors)
POLQHomologous-recombination-deficient cancers
USP1BRCA-deficient, PARPi-resistant
WEE1TP53-mutant, replication stress
PKMYT1CCNE1-amplified
ATRDDR / replication stress
MAT2AMTAP-deleted (partner of PRMT5)
BRD9SMARCB1-deficient synovial sarcoma
SMARCA2SMARCA4-mutant
KAT6AER-positive breast cancer
DNMT1AML and hematological cancers
CHK1DDR / replication stress
CDK7Transcription, super-enhancer addiction
CDK12DDR, mCRPC with biallelic CDK12 loss
PARGDDR
WRNIP1DDR (emerging)
SLC7A11KEAP1-mutant, ferroptosis vulnerability

The query

A single 1,200-character prompt — disease context, named candidate compounds, clinical-trial stage, druggability assessment per gene, plus a closing comparison grouping by maturity tier and identifying open opportunities versus crowded fields.

What came back

Wall clock: 1,782 seconds (29.7 minutes). Sources cited: 105. Below is a substantial excerpt; the full report runs to roughly 4,000 words and is the kind of thing you'd save into a research-ops doc and bring to the meeting.

Summary

The 20 targets span the maturity spectrum from FDA-approved (EZH2 with tazemetostat; DNMT1 indirectly via azacitidine/decitabine) down to fully preclinical (WRNIP1; SLC7A11/xCT). The most crowded, "late-and-late" fields are ATR, EZH2, MDM2, PRMT5 (MTA-cooperative) and WEE1, where multiple Phase 2/3 programs from large pharma are converging. The clearest commercial white space is around WRNIP1, SLC7A11, selective non-nucleoside DNMT1, BRD9 (post-failure reset), CDK12 selective drugs and PARG — each has either zero, one, or only failed clinical-stage assets, often in genetically defined niches with strong human-genetics validation.

Maturity-tier breakdown

TierTargetsNotes
ApprovedEZH2 (tazemetostat, valemetostat), DNMT1 (legacy aza/dec only)Only EZH2 has selective approvals; DNMT1 lacks any selective non-nucleoside drug.
Phase 3 / pivotalMDM2 (brigimadlin, navtemadlin), EZH2 (mevrometostat), KAT6A (PF-07248144), ATR (ceralasertib LATIFY), WEE1 (azenosertib)Leaders 1–3 years from filing; new entrants need clear differentiation.
Late Ph1 / Ph2PRMT5 (MRTX1719, AMG193, TNG462, AZD3470), MAT2A (IDE397), POLQ (ART4215), PKMYT1 (lunresertib), USP1 (RO7623066, ISM3091), CHK1 (prexasertib/ACR-368), CDK7 (samuraciclib), PARG (IDE161)Active proof-of-concept zone.
Early Ph1WRN (HRO761, RO7589831), SMARCA2 (PRT3789, FHD-909), CDK12/13 (CT7439), WEE1 (Debio 0123)First wave still de-risking.
Preclinical / failedBRD9 (CFT8634 disc.; FHD-609 paused), WRNIP1 (no asset), SLC7A11 (no clinical asset), selective DNMT1 (GSK programs paused)True white space.

One gene, in depth — WRN

Disease context. Synthetic-lethal dependency driven by TA-dinucleotide repeat expansions that form non-B DNA in MSI-H tumors, requiring WRN's helicase/ATPase activity for resolution.

Clinical pipeline.

  • Novartis HRO761 — allosteric inhibitor locking D1/D2 helicase domains, Phase 1/1b in MSI-H solid tumors.
  • Vividion/Roche RO7589831 (VVD-133214) — covalent allosteric inhibitor, also in Phase 1, both having shown early clinical activity.
  • Multiple fast-followers (e.g. ZMS-4084) entering at the bioisostere/IND stage.

Druggability: now validated — chameleonic structure-based drug design overcame the historic "undruggable helicase" label.

The full report covers MDM2, EZH2, PRMT5, POLQ, USP1, WEE1, PKMYT1, ATR, MAT2A, BRD9, SMARCA2, KAT6A, DNMT1, CHK1, CDK7, CDK12, PARG, WRNIP1, and SLC7A11 in the same shape — named compounds, clinical-stage programs, druggability assessments, with every claim cited.

The closing that drives the triage

Most crowded competitive fields: ATR (5 clinical assets, one in Ph3), EZH2 (≥5 clinical, two approved, one Ph3), MDM2 (≥4 mid-to-late), MTA-cooperative PRMT5 (≥4 with rapidly converging Ph2 reads), and WEE1 (≥4 with one in pivotal pursuit). New programs in these targets need a sharp differentiator — paralog/isoform selectivity, brain penetration (e.g., TNG908, Debio 0123), cleaner hematology safety, oral/QD dosing, or biomarker-defined niches not yet claimed.

Most open commercial opportunities for new programs:

  1. WRNIP1 — strong preclinical synthetic-lethality rationale and zero competition; risk is unproven druggability.
  2. SLC7A11/xCT — KEAP1-mutant NSCLC is a large genetically defined population with no targeted therapy.
  3. Selective non-nucleoside DNMT1 — large hematology market, validated biology, GSK's tools have stalled.
  4. BRD9 — after CFT8634/FHD-609 setbacks, a cleaner degrader with no QTc liability is contestable.
  5. PARG — only IDE161 is credible; second entrants with brain penetrance or distinct combos have a clear lane.
  6. PKMYT1 — Repare/Debiopharm is unchallenged in CCNE1-amp; a second-generation differentiated chemotype is a single-asset opportunity.
  7. CDK12 (selective) — high technical risk but uncontested clinically; success addresses an mCRPC/ovarian biomarker population with no targeted option.

That's the answer to the triage question, before the meeting starts.

What this would have cost by hand

Twenty genes × 25 hours per gene = 500 person-hours — three months of one comp-bio scientist's calendar, just to triage. By Friday's meeting, structurally impossible. Any honest manual workflow at this scale truncates to "skim five genes I already know, list fifteen as 'needs follow-up later,' guess." The cost of that compression is what shows up three months later as "we should have caught that earlier."

With SPARKIT: a single 1,200-character prompt, thirty minutes of wall clock, one query against the API. The triage that would have taken three months lands in time for the meeting, with citations intact so leadership can audit any part of the synthesis they want to push back on.

What SPARKIT didn't do

Honesty about a real workflow:

  1. The report does not include patent-landscape synthesis — only published literature and press releases. For competitive IP review you still need Espacenet or a paid patent search tool.
  2. Strategic recommendations are the model's reading of the literature, not market analysis. The "WRNIP1 is the most open commercial opportunity" framing is a credible reading of what's published, but a comp-bio scientist's gut feel about regulatory/payor risk, target-class precedent, or specific company posture is not in those sources. The Markdown is the substrate; the human still does the strategy.
  3. A few targets came back thinner than others. WRNIP1 has almost no chemistry literature — SPARKIT correctly flagged that, but the report on that gene is shorter and less actionable as a result. The agent reflects the field's coverage; it does not invent literature that isn't there.

The point isn't to take the human out of the triage loop. It's to move them from skimming abstracts to evaluating sourced summaries — and to deliver in minutes what used to take weeks.

The pattern

Triage isn't research. It's the gate before research — the question of which of fifty hits is worth anyone's time. SPARKIT compresses the gate from months to minutes, with the citations intact so the next person can audit the call.

The science is hard. Reading the science shouldn't be the bottleneck.