INDEX
    Explanations

    exclamatory expressions that convey strong emotions or reactions

    New Auto-Interp
    Negative Logits
    ica
    -0.18
    anc
    -0.15
    ICA
    -0.15
    roje
    -0.15
    ixa
    -0.14
    bot
    -0.14
    cki
    -0.14
    ownt
    -0.14
    thed
    -0.14
    ä¸įå¾Ĺ
    -0.14
    POSITIVE LOGITS
    !!.
    0.26
    !↵
    0.23
    !!!!↵↵
    0.23
    111
    0.21
    !!!
    0.21
    !↵↵
    0.20
    [](
    0.20
    !"
    0.18
    !!!↵↵
    0.18
    !!↵
    0.18
    Act Density 0.015%

    No Known Activations