INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Severity
    0.73
    oloj
    0.68
    think
    0.65
    0.64
    the
    0.61
    Aspect
    0.61
    Speaker
    0.61
     Agustus
    0.61
    th
    0.60
    atrice
    0.60
    POSITIVE LOGITS
    ве
    0.73
    פ
    0.66
     фами
    0.66
    ב
    0.66
    0.66
     المط
    0.65
    РИ
    0.64
     כן
    0.63
     apă
    0.63
     מי
    0.62
    Act Density 0.000%

    No Known Activations