INDEX
    Explanations

    describes things contained

    New Auto-Interp
    Negative Logits
     в
    0.72
     في
    0.66
    ка
    0.64
    ”،
    0.52
    и
    0.52
    ية
    0.50
     В
    0.47
     در
    0.46
     σε
    0.46
     ګرځنده
    0.46
    POSITIVE LOGITS
    b
    0.77
    ar
    0.76
    f
    0.73
    ol
    0.72
    s
    0.68
    l
    0.66
    t
    0.65
    m
    0.65
    et
    0.64
    ad
    0.64
    Act Density 0.341%

    No Known Activations