INDEX
    Explanations

    expressions of strong emotions or reactions

    New Auto-Interp
    Negative Logits
    -0.23
    /or
    -0.17
    ï¼īãģ¯
    -0.16
    .]↵↵
    -0.16
    _B
    -0.16
    _S
    -0.16
    -S
    -0.15
    |x
    -0.15
    _Syntax
    -0.15
    -B
    -0.15
    POSITIVE LOGITS
    11
    0.33
    111
    0.31
    1
    0.31
    !(
    0.25
    (
    0.22
    0.22
    <
    0.21
    10
    0.21
    !--
    0.21
    12
    0.20
    Act Density 0.013%

    No Known Activations