INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    erer
    -0.17
    £
    -0.16
    ivery
    -0.16
    erken
    -0.14
     anc
    -0.14
    nze
    -0.14
    меÑģÑĤ
    -0.14
    οκ
    -0.14
    iverz
    -0.14
    erif
    -0.14
    POSITIVE LOGITS
    illes
    0.28
    oral
    0.25
    ries
    0.24
    eur
    0.24
    afari
    0.24
    ille
    0.23
    ebin
    0.22
    ır
    0.21
     tense
    0.21
    el
    0.20
    Act Density 0.008%

    No Known Activations