INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ank
    -0.07
     rhet
    -0.07
     trump
    -0.07
    anger
    -0.06
    collapsed
    -0.06
     Abel
    -0.06
    Пр
    -0.06
    Detroit
    -0.06
    eres
    -0.06
    open
    -0.06
    POSITIVE LOGITS
     yönetimi
    0.07
    .;
    0.07
    _SEGMENT
    0.07
    0.06
    0.06
     prick
    0.06
     해당
    0.06
    0.06
     paciente
    0.06
     نیروی
    0.06
    Act Density 0.039%

    No Known Activations