INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    0.90
    ла
    0.79
    ل
    0.79
    د
    0.77
    ap
    0.74
    ler
    0.71
    i
    0.70
    كيد
    0.69
    as
    0.68
    os
    0.68
    POSITIVE LOGITS
    0.79
    0.75
    t
    0.74
    ,
    0.69
     människor
    0.63
     человеку
    0.61
     mennesker
    0.60
    时候
    0.59
    People
    0.59
    (
    0.59
    Act Density 0.058%

    No Known Activations