INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ו
    1.43
    in
    1.39
    the
    1.35
    و
    1.27
    at
    1.24
    f
    1.16
    re
    1.15
    it
    1.06
    al
    1.06
    1.03
    POSITIVE LOGITS
    ä
    1.73
     with
    1.56
     by
    1.49
     on
    1.38
     and
    1.33
    1
    1.14
    (
    1.13
     at
    1.05
    1.02
     může
    1.01
    Act Density 0.006%

    No Known Activations