INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     virt
    -0.07
     Lim
    -0.07
     centroid
    -0.06
     buluş
    -0.06
     лим
    -0.06
     sinks
    -0.06
     Cont
    -0.06
     Phase
    -0.06
     Ops
    -0.06
     Demonstr
    -0.06
    POSITIVE LOGITS
    aft
    0.07
    .mysql
    0.06
     العربي
    0.06
    0.06
    เพลง
    0.06
     εν
    0.06
    .vars
    0.06
    arsimp
    0.06
     happiness
    0.06
    思い
    0.06
    Act Density 0.001%

    No Known Activations