INDEX
    Explanations

    phrases indicating personal reflection or self-identity

    Tokens preceding em-dashes

    New Auto-Interp
    Negative Logits
     â
    -1.73
    â
    -1.63
    Ã
    -1.09
     Â
    -1.08
    Â
    -1.06
     Ã
    -1.02
     ¦
    -0.90
     `
    -0.89
     „
    -0.88
     ð
    -0.84
    POSITIVE LOGITS
    1.64
    '
    1.39
    。"
    1.38
    ...'
    1.36
    :"
    1.30
     ​
    1.27
    1.26
     ‐
    1.26
    ...”
    1.23
    '...
    1.23
    Act Density 0.604%

    No Known Activations