INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Holm
    -0.08
     Ara
    -0.07
     туш
    -0.07
     tung
    -0.07
     mart
    -0.07
     flaw
    -0.07
     rook
    -0.07
     множ
    -0.07
     solcher
    -0.07
     though
    -0.07
    POSITIVE LOGITS
    ulare
    0.08
    raised
    0.08
    édio
    0.08
     Danach
    0.08
     prescribing
    0.08
     taboo
    0.08
     prescriptions
    0.08
     بالت
    0.07
     boosters
    0.07
     Prescription
    0.07
    Act Density 0.007%

    No Known Activations