INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    a
    3.38
    i
    3.30
    u
    3.03
    e
    2.91
    o
    2.70
    il
    2.56
    و
    2.56
    m
    2.45
    ar
    2.39
    in
    2.36
    POSITIVE LOGITS
     vanes
    1.50
    可知
    1.40
     passers
    1.37
     clues
    1.32
     INS
    1.31
     wonderland
    1.31
    <unused2157>
    1.30
    ING
    1.29
     pores
    1.29
     bribes
    1.27
    Act Density 0.011%

    No Known Activations