INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.26
     =
    1.22
    的時間
    1.05
     is
    1.03
     if
    1.00
     for
    0.98
     when
    0.98
    的状态
    0.97
     to
    0.96
    ↵↵
    0.93
    POSITIVE LOGITS
    ת
    1.76
    ت
    1.59
    in
    1.42
    n
    1.41
    t
    1.37
    т
    1.34
    ing
    1.32
    at
    1.26
    ре
    1.25
    and
    1.23
    Act Density 0.008%

    No Known Activations