INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.04
    ن
    1.02
    ли
    0.96
     I
    0.94
    že
    0.90
    0.89
     is
    0.88
    0.87
     was
    0.86
    的な
    0.86
    POSITIVE LOGITS
    are
    1.20
    u
    1.19
    T
    1.19
    K
    1.16
    ing
    1.09
    W
    1.09
    ue
    1.08
    ang
    1.07
    A
    1.05
    H
    1.05
    Act Density 0.568%

    No Known Activations