INDEX
    Explanations

    special characters or symbols in the text

    New Auto-Interp
    Negative Logits
    -0.34
     (“
    -0.28
     “[
    -0.26
     (
    -0.26
     “â̦
    -0.25
    -0.20
     ``
    -0.20
     "
    -0.19
     "(
    -0.18
     "`
    -0.18
    POSITIVE LOGITS
    -'
    0.23
     fucking
    0.23
     ourselves
    0.21
     fuck
    0.20
    -↵↵
    0.20
    –↵↵
    0.20
    -"
    0.20
    -.
    0.19
     -↵↵
    0.19
     our
    0.19
    Act Density 0.003%

    No Known Activations