INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     shattered
    0.63
     Mandal
    0.62
     Guildford
    0.60
     ﺍﻟ
    0.59
     написать
    0.57
     gdyż
    0.57
     Tenemos
    0.57
     Pada
    0.55
     banged
    0.55
     Ours
    0.55
    POSITIVE LOGITS
    0.74
    f
    0.66
    le
    0.64
    0.64
    je
    0.63
    h
    0.62
    يه
    0.62
    a
    0.61
    an
    0.60
    heter
    0.58
    Act Density 0.014%

    No Known Activations