INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    отор
    -0.07
    _attack
    -0.07
    _Input
    -0.07
     altru
    -0.07
    Sparse
    -0.07
    _SHADOW
    -0.07
     Kıs
    -0.07
    -black
    -0.07
    <Block
    -0.06
    sian
    -0.06
    POSITIVE LOGITS
    pm
    0.11
     ±
    0.09
    ±
    0.07
     Courts
    0.07
     fulfillment
    0.06
    ้องพ
    0.06
    							  
    0.06
    0.06
     والم
    0.06
    0.06
    Act Density 0.001%

    No Known Activations