INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ن
    1.45
    1.42
    ש
    1.24
    IS
    1.23
    н
    1.16
    Σ
    1.15
    我们
    1.13
    1.13
    1.08
     powierzchn
    1.07
    POSITIVE LOGITS
    t
    1.87
    y
    1.47
    is
    1.39
    o
    1.36
    g
    1.31
    are
    1.25
    il
    1.23
    ak
    1.20
    на
    1.20
     or
    1.18
    Act Density 0.000%

    No Known Activations