INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .sun
    -0.07
    LOSE
    -0.07
     Enlight
    -0.06
    ünden
    -0.06
     سنوات
    -0.06
    beat
    -0.06
     Numbers
    -0.06
    +'_
    -0.06
    رفته
    -0.06
    Release
    -0.06
    POSITIVE LOGITS
     microbi
    0.12
     acknowledgment
    0.09
     Tri
    0.07
     Micro
    0.07
    �u
    0.07
     сопров
    0.07
    biology
    0.07
    Wy
    0.07
    olutely
    0.07
    andez
    0.07
    Act Density 0.002%

    No Known Activations