INDEX
Explanations
occurrences of the pronoun "I" and related phrases indicating personal opinion or reflection
New Auto-Interp
Negative Logits
.contacts
-0.15
erne
-0.15
desn
-0.15
fak
-0.15
Stateless
-0.14
kü
-0.14
íĶĪ
-0.14
theid
-0.14
Mention
-0.14
Recomm
-0.14
POSITIVE LOGITS
thought
0.20
hast
0.19
kid
0.18
mean
0.17
tell
0.17
suppose
0.16
reasoning
0.16
tells
0.15
batis
0.15
reflection
0.15
Activations Density 0.036%