INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    cccc
    -0.07
    (trans
    -0.07
     Πολ
    -0.07
    258
    -0.07
     lễ
    -0.06
    _gender
    -0.06
    _del
    -0.06
    ByName
    -0.06
     spelled
    -0.06
    _sleep
    -0.06
    POSITIVE LOGITS
    .Sin
    0.07
    sad
    0.06
    /File
    0.06
    :text
    0.06
    .Abs
    0.06
    èo
    0.06
     Так
    0.06
    icket
    0.06
    )
    
    ↵
    0.06
    !'↵
    0.05
    Act Density 0.025%

    No Known Activations