Tilelli  /  Tilelli Med  /  Methods

technical methods

How the model is built.

A short, honest technical description. What we did, how we evaluated it, and what we didn't do.

Base model

We use ComplEx (Trouillon et al., 2016): a complex-valued tensor factorization scoring function over (head, relation, tail) triples. For a triple (h, r, t), the score is the real part of <h, r, conj(t)> over complex embeddings.

We add the N3 regularizer (Lacroix et al., 2018), a tensor-power L3 regularization that consistently dominates L2/Frobenius for ComplEx-family models, and reciprocal relations: each training triple (h, r, t) is augmented with (t, r⁻¹, h). The combined recipe ("ComplEx-N3") is the published high-water mark on standard knowledge-graph benchmarks.

Training

Compression to ternary

Each entity's real and imaginary embedding tables are quantized independently to {−1, 0, +1} with a small per-block float scale. The block size B is the knob: B=1 means a single scale per row (highest compression), B=512 means a scale per dimension (no compression). At B=128 we get 5.3× compression of the entity tables and the model still beats the OGBL TransE leaderboard baseline.

Evaluation

We use the official OGB filtered protocol: each test triple ships with 500 type-constrained negatives, already filtered against the train + valid + test splits. We rank the gold against (gold + 500 negatives) on both head and tail sides, and report mean reciprocal rank.

Multi-seed reproducibility

We trained 3 independent ComplEx-N3 models with random seeds 1, 2, 3:

SeedValidation MRRHits@1Hits@10
10.83780.7740.946
20.84270.7860.945
30.84360.7850.949
Mean ± SD0.8414 ± 0.0030.782 ± 0.0050.947 ± 0.002

SD of 0.003 across seeds means the result is stable.

Compression sweep (test set)

Block sizeMRRHits@1Hits@10Compression
Float teacher0.8470.7900.949
B=2560.7940.7170.9393.2×
B=1280.7520.6670.9235.3×
B=640.7300.6370.9148.0×
B=1 (per-row)0.6960.5920.90115.8×

Reference leaderboard entries: TransE 0.745, RotatE 0.799, ComplEx 0.810. Our B=128 ternary edges out TransE; B=256 sits between TransE and RotatE.

Agreement head (per-query confidence)

We trained a small MLP that takes the float (h, r) embeddings and predicts whether the ternary student will agree with the float teacher on the top-1 tail for that query. This gives a per-query confidence signal — useful clinically, where "is the cheap model reliable for this case?" is the question that matters.

This is the medical analogue of the NEO-style metacognition we measure across frontier chat models — exposed not as a black-box signal but as a small, auditable predictor whose AUC and Brier are reported on every release.

Candidate prediction pipeline

For a target disease D (UMLS CUI):

  1. Locate D's entity ID in the OGBL-biokg disease space.
  2. For every drug entity, compute the score for the triple (drug, drug-disease, D) using each seed's teacher; average across seeds.
  3. Filter out drugs that already appear in any (drug, drug-disease, D) triple in the OGBL-biokg train, valid, or test splits.
  4. Return the top 20 by mean score, with seed standard deviation as a stability column.
  5. For each top candidate, look up indications via the ChEMBL drug-indication API (using UniChem to map PubChem CID → ChEMBL ID) and the Open Targets GraphQL API. Flag candidates with at least one indication for D.

What we didn't do

References