INDEX
    Explanations

    stereotype, tolerance, description

    New Auto-Interp
    Negative Logits
    ان
    0.70
    ת
    0.70
    م
    0.64
    ین
    0.60
    مام
    0.59
    т
    0.59
    ين
    0.59
    ників
    0.58
    mim
    0.57
    0.56
    POSITIVE LOGITS
    '
    0.85
    ?
    0.66
    0.63
    .
    0.63
    ;
    0.62
     life
    0.58
    \
    0.57
    :
    0.57
     in
    0.57
    !
    0.56
    Act Density 0.000%

    No Known Activations