INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ن
    0.79
    ش
    0.77
    0.77
    ivasena
    0.76
    онов
    0.75
    0.75
    li
    0.73
    лить
    0.73
    0.73
    larını
    0.72
    POSITIVE LOGITS
    '
    1.20
    ו
    1.05
    0.86
    CH
    0.81
    c
    0.81
    D
    0.81
    ב
    0.79
    К
    0.77
    P
    0.76
     revealed
    0.76
    Act Density 0.028%

    No Known Activations