INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Demon
    -0.07
     helping
    -0.06
     firsthand
    -0.06
    _STA
    -0.06
     ros
    -0.06
    patches
    -0.06
     Soc
    -0.06
     baff
    -0.06
     Bro
    -0.06
     Мик
    -0.06
    POSITIVE LOGITS
    вен
    0.07
    0.07
     inevitable
    0.06
    /*/
    0.06
     [{'
    0.06
     [{"
    0.06
     δύο
    0.06
     Lisp
    0.06
    شود
    0.06
     nelze
    0.06
    Act Density 0.000%

    No Known Activations