INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     are
    0.92
    ،
    0.80
    4
    0.77
    that
    0.74
     لیکن
    0.72
    5
    0.71
    igh
    0.71
    atch
    0.67
    ی
    0.67
    いますが
    0.65
    POSITIVE LOGITS
    h
    1.23
    ر
    1.16
    o
    1.09
    is
    1.09
    n
    1.08
    r
    1.03
    y
    1.03
    et
    1.02
    на
    0.94
    en
    0.93
    Act Density 0.066%

    No Known Activations