INDEX
Explanations
references to the second-person pronoun "you."
New Auto-Interp
Negative Logits
fucked
-0.15
itos
-0.15
.XR
-0.14
cấp
-0.14
lop
-0.14
éd
-0.14
inus
-0.14
alfa
-0.14
pissed
-0.14
munition
-0.14
POSITIVE LOGITS
sound
0.21
said
0.20
obviously
0.20
ok
0.18
two
0.18
mentioned
0.18
sir
0.18
mean
0.17
and
0.16
OK
0.16
Activations Density 0.110%