INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     корист
    -0.08
     cał
    -0.07
     Husband
    -0.06
    ัปดาห
    -0.06
     СП
    -0.06
     Mil
    -0.06
     valid
    -0.06
    -[
    -0.06
     голод
    -0.06
    	MessageBox
    -0.06
    POSITIVE LOGITS
     Soft
    0.11
    Soft
    0.10
     soft
    0.09
    soft
    0.08
    .soft
    0.07
    _soft
    0.07
     BCM
    0.07
     liberalism
    0.07
    0.07
    struments
    0.07
    Act Density 0.005%

    No Known Activations