INDEX
    Explanations

    phrases indicating emphasis or affirmation

    New Auto-Interp
    Negative Logits
    utherford
    -0.20
    eyer
    -0.17
    	throws
    -0.17
    izzo
    -0.16
    ahl
    -0.15
    æķ·
    -0.15
    ultz
    -0.14
    ango
    -0.14
    egers
    -0.14
    ksen
    -0.14
    POSITIVE LOGITS
     δη
    0.33
     tức
    0.29
    å°±æĺ¯
    0.29
     ÑĤобÑĤо
    0.28
     heiÃŁ
    0.27
    åį³
    0.25
     decir
    0.25
     ì¦ī
    0.24
    ãģ¤
    0.22
     еÑģÑĤÑĮ
    0.22
    Act Density 0.053%

    No Known Activations