INDEX
    Explanations

    software configurations/code

    New Auto-Interp
    Negative Logits
     centro
    -0.07
    ۴۰
    -0.07
    aghan
    -0.07
    рами
    -0.06
     pressures
    -0.06
     keras
    -0.06
    자가
    -0.06
    bersome
    -0.06
    inning
    -0.06
    rawler
    -0.06
    POSITIVE LOGITS
    	icon
    0.06
    _elim
    0.06
     кли
    0.06
     meme
    0.06
    	point
    0.06
     calc
    0.06
    lava
    0.06
     lemma
    0.06
     Aleppo
    0.06
     bites
    0.06
    Act Density 0.100%

    No Known Activations