INDEX
    Explanations

    phrases that express skepticism or challenge common beliefs

    New Auto-Interp
    Negative Logits
     however
    -0.21
     However
    -0.17
    åį´
    -0.16
     jedoch
    -0.16
     HOWEVER
    -0.15
    åĪĻ
    -0.15
    اÙĥÙĨ
    -0.15
     nevertheless
    -0.15
    smarty
    -0.14
    éal
    -0.14
    POSITIVE LOGITS
    ?
    0.16
     importantly
    0.16
    è¿Ľä¸ĢæŃ¥
    0.15
     nữa
    0.15
    ,
    0.15
     equally
    0.14
    !
    0.14
    odash
    0.14
    -Za
    0.14
    otto
    0.14
    Act Density 0.325%

    No Known Activations