INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     invasion
    -0.08
     sna
    -0.08
    a
    -0.07
    ma
    -0.07
    la
    -0.07
    форма
    -0.07
    -0.07
    πλ
    -0.06
    .Aggressive
    -0.06
     سلام
    -0.06
    POSITIVE LOGITS
     operating
    0.17
     Operating
    0.14
    Operating
    0.10
     Irving
    0.07
    istinguished
    0.07
    0.07
     notify
    0.07
     Cry
    0.07
    eting
    0.07
     pollut
    0.07
    Act Density 0.007%

    No Known Activations