Tilelli  /  Tilelli Med  /  PrimeKG follow-up

follow-up · 12 May 2026

Same ternary recipe, replicated on PrimeKG (2023). Quantization beats the float teacher on a graph seven years younger than OGBL-biokg.

A reviewer noted that OGBL-biokg dates to 2016 sources. We ran the identical pipeline on PrimeKG (Chandak, Huang & Zitnik, Nature Sci Data 2023) — 4.83 million edges, 30 relation types, 129,375 entities, built for precision-medicine drug discovery. The ternary student again outperforms its own float teacher.

0.2972
Ternary B=128 test MRR
+0.0069 above the float teacher
5.3×
Compression of entity tables
506 MB → 95 MB
0.802
Indication MRR on held-out test
gold drug typically rank 1–2 of 7,957
$5.20
Total external compute
RunPod A40 + Vast 4090

PrimeKG 2023 ComplEx-N3 Ternary {−1, 0, +1} SMILES warm-start Not medical advice

Why we ran this

The 10 May write-up reported a strong ternary result on OGBL-biokg, the Stanford Open Graph Benchmark for biomedical link prediction. The benchmark dates to 2016 sources. A reviewer flagged that the benchmark is dated and asked whether the result holds on a newer graph.

PrimeKG is the de-facto modern replacement. It was assembled by Marinka Zitnik's group at Harvard in 2022–2023, with relations explicitly designed for precision-medicine work: indication, contraindication, off-label use, drug_drug (synergistic interactions), drug_protein, disease_phenotype_positive/negative, pathway_protein, anatomy_protein_present, exposure_disease, and 21 others.

The headline table

Filtered MRR on the canonical 10,000-triple held-out test split, with 500 type-constrained negatives per query (same protocol as OGBL).

ModelTEST MRRH@10StorageCompression
Random init — float teacher0.28840.502506 MB
Random init — ternary B=2560.29390.506158 MB3.2×
Random init — ternary B=1280.29330.50295 MB5.3×
Random init — ternary B=10.29350.50032 MB15.8×
SMILES warm — float teacher0.29030.503506 MB
SMILES warm — ternary B=1280.29720.50795 MB5.3×
SMILES warm — ternary B=10.29460.50132 MB15.8×
SMILES warm (seed 43) — float0.29110.507506 MB

The two real findings

★ Finding 1 — replicated

Ternary quantization improves the teacher.

At every compression level (3.2× through 15.8×), the ternary student outperforms its float teacher. Best: warm B=128 (5.3× compression) = 0.2972 MRR, vs float warm 0.2903.

This is the same effect we observed on OGBL-biokg (May 10). The mechanism is the same: the float teacher mildly overfits (best val MRR is at epoch 5; ep 25 is lower), and the ternary quantization step acts as a regularizer that recovers generalization. The result now holds across two benchmarks separated by seven years of data construction.

○ Finding 2 — modest, replicated

The chemistry warm-start gives a small lift.

Initialising the drug rows of the entity table from Morgan fingerprints (radius 2, 2048 bits → PCA-512) improves aggregate MRR by +0.0023 (mean of two seeds: 0.2884 random → 0.2907 warm). 60.4% of PrimeKG drugs were warm-started; the rest stayed random-init.

The lift is concentrated in contraindication (H@10 +3pp) and is neutral on indication (already near-ceiling at MRR=0.80). Stated honestly: the chemistry prior helps the model say "this drug should NOT be used for this disease" slightly more than "this drug should treat this disease."

Drug-repurposing eval (the actual usefulness signal)

For each held-out drug-disease edge in PrimeKG's test split, we rank every drug as a candidate completion. Then we report the rank of the gold drug. With 7,957 drugs in the candidate pool:

Relationn testRandom / Float MRRRandom / Float H@10Warm / Float MRRWarm / Float H@10
indication260.8020.930.7960.93
off-label use61.0001.000.6671.00
contraindication720.3370.740.3380.77

Read: for the 26 held-out indications, the gold drug typically lands at rank 1 or 2 out of 7,957 candidates (MRR=0.80, H@10=0.93). The model is finding the right drug pattern even though the (drug, indication, disease) edge was filtered out of training.

Caveat: test sample sizes for the rare relations (indication n=26, off-label n=6) are small because PrimeKG itself has few of these edges. The corresponding numbers are reported here without over-claiming. For off-label especially, a 6-sample difference between conditions is not statistically meaningful.

Reproducibility

Research preview. Not medical advice. Predictions are intended for hypothesis generation by qualified researchers, not for direct clinical use.