INDEX
    Explanations

    abstract descriptive words

    New Auto-Interp
    Negative Logits
     I
    1.18
    h
    1.17
    ing
    1.13
    ia
    1.09
     for
    1.04
    et
    0.96
    ä
    0.95
    em
    0.95
     It
    0.94
    í
    0.94
    POSITIVE LOGITS
    0.97
    0.93
    0.83
    '
    0.79
    )।
    0.78
     έχουν
    0.75
     وأ
    0.74
     médicas
    0.73
     σε
    0.73
    يا
    0.73
    Act Density 1.760%

    No Known Activations