INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     begr
    -0.78
     sakin
    -0.77
    加減
    -0.76
    ologio
    -0.74
    ǜ
    -0.73
    Symb
    -0.72
     partisans
    -0.71
    -0.71
     lest
    -0.70
     aspect
    -0.70
    POSITIVE LOGITS
     sil
    1.75
     SIL
    1.63
     Sil
    1.55
    Sil
    1.53
    SIL
    1.41
     silen
    1.35
    sil
    1.28
     sile
    1.23
    sile
    1.19
     シル
    1.09
    Act Density 0.017%

    No Known Activations