INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     and
    1.62
    ،
    1.47
    '
    1.41
    í
    1.38
     be
    1.36
    ي
    1.32
    ات
    1.27
    é
    1.27
     are
    1.18
     for
    1.14
    POSITIVE LOGITS
    9
    1.68
    .
    1.59
    1.43
    دی
    1.41
    1.39
    1.38
    لی
    1.35
    1.34
    ない
    1.27
    نی
    1.27
    Act Density 0.131%

    No Known Activations