INDEX
Explanations
characters that appear to be separators or special characters within a text
instances of uncertainty or skepticism
New Auto-Interp
Negative Logits
eleph
-0.89
undermin
-0.75
Compass
-0.68
constit
-0.66
adversaries
-0.66
proport
-0.64
warships
-0.63
tremend
-0.62
representation
-0.60
advisors
-0.60
POSITIVE LOGITS
posted
1.08
Posted
1.02
Anonymous
1.01
Anyway
0.96
↵
0.93
lol
0.91
Í
0.90
Honestly
0.89
EDIT
0.82
Seriously
0.82
Activations Density 0.278%