INDEX
    Explanations

    expressions of opinions or feelings about individuals or events

    New Auto-Interp
    Negative Logits
    erno
    -0.16
    lic
    -0.16
    enberg
    -0.15
    wer
    -0.15
     allegedly
    -0.15
    Ñĩем
    -0.14
    enci
    -0.14
    bout
    -0.14
    ency
    -0.13
    imu
    -0.13
    POSITIVE LOGITS
    æĺ¯ä¸Ģ个
    0.38
    æĺ¯ä¸ª
    0.36
    æĺ¯ä¸Ģ
    0.30
     sebuah
    0.26
     an
    0.23
    ä¸Ģ个
    0.21
    —a
    0.21
     eine
    0.20
    ä¸Ģç§į
    0.20
     een
    0.20
    Act Density 0.218%

    No Known Activations