INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     thi
    -0.08
     جام
    -0.07
     agitation
    -0.07
     disip
    -0.07
     wh
    -0.07
     tabela
    -0.07
     lado
    -0.07
     GMT
    -0.07
     мож
    -0.07
     kroon
    -0.07
    POSITIVE LOGITS
    truth
    0.09
    rit
    0.08
    lau
    0.08
     Buchanan
    0.08
     metab
    0.08
    rufen
    0.07
    flutter
    0.07
    etés
    0.07
    Truth
    0.07
    updated
    0.07
    Act Density 0.175%

    No Known Activations