INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fark
    -0.07
    -ves
    -0.07
     lex
    -0.06
     ;-)
    -0.06
    Representation
    -0.06
     кап
    -0.06
     Ips
    -0.06
     adaptor
    -0.06
     retVal
    -0.06
     לא
    -0.06
    POSITIVE LOGITS
    ний
    0.07
     stagger
    0.07
    0.07
    strike
    0.07
    _every
    0.07
    ActionButton
    0.07
     uniqu
    0.07
    important
    0.07
    ystone
    0.07
     boosted
    0.07
    Act Density 0.005%

    No Known Activations