INDEX
    Explanations

    with confidence or examples

    New Auto-Interp
    Negative Logits
    ка
    1.07
    in
    0.96
    ara
    0.96
     reservados
    0.91
    0.86
    ని
    0.85
    ik
    0.82
    ien
    0.80
    0.80
     تاريخ
    0.79
    POSITIVE LOGITS
     with
    1.59
    y
    1.55
    1.24
    t
    1.23
    ב
    1.20
    ת
    1.19
    ی
    1.18
    l
    1.17
    ED
    1.14
    ם
    1.13
    Act Density 0.436%

    No Known Activations