INDEX
    Explanations

    expressions of strong emotions or evaluations about experiences

    New Auto-Interp
    Negative Logits
    olit
    -0.16
    addock
    -0.15
    ocrates
    -0.15
    229
    -0.15
    511
    -0.15
    fusc
    -0.15
    ukt
    -0.14
    swer
    -0.14
    lopedia
    -0.13
     ÙĦب
    -0.13
    POSITIVE LOGITS
    ify
    0.16
    chalk
    0.14
    ãģ¡ãĤĥãĤĵ
    0.14
    Same
    0.14
    ÙĮ
    0.13
    ARING
    0.13
    dj
    0.13
    ázÃŃ
    0.13
    Sys
    0.13
     меÑģÑĤо
    0.13
    Act Density 0.133%

    No Known Activations