INDEX
Explanations
instances of the pronoun "I" indicating personal statements or opinions
New Auto-Interp
Negative Logits
eno
-0.17
abouts
-0.17
UGHT
-0.17
ToInt
-0.16
ervations
-0.15
ught
-0.15
tti
-0.15
μή
-0.14
clide
-0.14
lected
-0.14
POSITIVE LOGITS
ogui
0.18
mage
0.17
sth
0.17
elts
0.16
spor
0.15
ronic
0.15
ront
0.15
stance
0.15
Spy
0.15
cion
0.15
Activations Density 0.200%