INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    л
    1.90
    s
    1.88
    la
    1.75
    d
    1.73
    t
    1.60
    st
    1.59
    land
    1.48
    li
    1.48
     and
    1.30
    lt
    1.30
    POSITIVE LOGITS
    P
    1.45
    R
    1.23
    M
    1.22
    O
    1.19
     a
    1.16
    </b>
    1.14
    に取り組
    1.12
    W
    1.09
    </h5>
    1.07
    onnés
    1.06
    Act Density 3.678%

    No Known Activations