INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    states
    -0.08
     کرد
    -0.07
    kins
    -0.06
    -0.06
     Cone
    -0.06
    /Library
    -0.06
    .Foundation
    -0.06
    _processing
    -0.06
    ιλο
    -0.06
     pracy
    -0.06
    POSITIVE LOGITS
    0.07
    (origin
    0.06
    		       
    0.06
     za
    0.06
    classname
    0.06
     ди
    0.06
    617
    0.06
    lexical
    0.06
    }{↵
    0.06
     mommy
    0.06
    Act Density 0.001%

    No Known Activations