INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     is
    0.33
     in
    0.32
    s
    0.28
    1
    0.24
     be
    0.24
     of
    0.23
     tradition
    0.23
    riad
    0.22
     (
    0.22
     are
    0.22
    POSITIVE LOGITS
    و
    0.39
    ל
    0.36
    ou
    0.34
    ul
    0.34
    ו
    0.34
    ل
    0.32
    ور
    0.30
    ar
    0.29
    is
    0.29
    मधील
    0.29
    Act Density 0.803%

    No Known Activations