INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    えっ
    -1.80
    -1.75
    -1.71
    𓋼
    -1.70
    -1.66
    もうすぐ
    -1.64
    -1.63
    おわりに
    -1.63
    あぁ
    -1.62
     复古
    -1.61
    POSITIVE LOGITS
     also
    2.09
     is
    1.55
    ,
    1.51
     --
    1.46
    <bos>
    1.34
    w
    1.29
    d
    1.28
    但却
    1.27
     other
    1.27
    l
    1.27
    Act Density 0.120%

    No Known Activations