INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     SMALL
    -0.07
    _Color
    -0.06
     Funny
    -0.06
     Clayton
    -0.06
    Allocate
    -0.06
    -0.06
     incorrect
    -0.06
    errat
    -0.06
     Linked
    -0.06
    uba
    -0.06
    POSITIVE LOGITS
     agitation
    0.07
    eşil
    0.07
     cheats
    0.06
    ISTR
    0.06
    -groups
    0.06
     '=',
    0.06
    ebilecek
    0.06
     ');↵
    0.06
    ιας
    0.06
    labilir
    0.06
    Act Density 0.003%

    No Known Activations