INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    al
    1.30
    ol
    1.27
    at
    1.26
    lige
    1.20
    lardan
    1.19
    larni
    1.16
    llo
    1.16
    ljen
    1.13
    ning
    1.12
    y
    1.09
    POSITIVE LOGITS
    ש
    1.40
     an
    1.34
    ס
    1.24
     a
    1.12
    1.09
    )$.
    1.00
    ח
    1.00
     alder
    1.00
     טוב
    0.99
    IM
    0.98
    Act Density 0.102%

    No Known Activations