INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     O
    1.13
     L
    1.05
     M
    0.97
     B
    0.96
     N
    0.86
     God
    0.84
     P
    0.84
     C
    0.83
     Bou
    0.82
     G
    0.82
    POSITIVE LOGITS
    inhas
    1.55
    ipas
    1.53
     doraemon
    1.47
    .)..
    1.47
    erit
    1.46
    jetas
    1.44
    ynit
    1.43
    irts
    1.43
    underland
    1.42
     puris
    1.41
    Act Density 1.138%

    No Known Activations