INDEX
Explanations
phrases indicating personal reflection or self-identity
Tokens preceding em-dashes
anterior posterior
New Auto-Interp
Negative Logits
â
-1.73
â
-1.63
Ã
-1.09
Â
-1.08
Â
-1.06
Ã
-1.02
¦
-0.90
`
-0.89
„
-0.88
ð
-0.84
POSITIVE LOGITS
‐
1.64
'
1.39
。"
1.38
...'
1.36
:"
1.30
1.27
1.26
‐
1.26
...”
1.23
'...
1.23
Activations Density 0.604%