INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	IL
    -0.07
    _INST
    -0.07
    -0.07
     robbed
    -0.06
     Fury
    -0.06
    owy
    -0.06
    iptables
    -0.06
     puppy
    -0.06
    рост
    -0.06
     kittens
    -0.06
    POSITIVE LOGITS
     change
    0.24
     Change
    0.20
    -change
    0.14
    Change
    0.14
    change
    0.13
     CHANGE
    0.12
    CHANGE
    0.10
     changes
    0.10
    _change
    0.09
    chang
    0.09
    Act Density 0.044%

    No Known Activations