INDEX
Explanations
second person pronouns followed by a verb
instances of the word "you."
New Auto-Interp
Negative Logits
¿½
-0.95
tains
-0.84
enges
-0.72
advertisement
-0.70
Materials
-0.69
Adds
-0.68
£ı
-0.68
assemb
-0.67
ipal
-0.65
ãĤ´ãĥ³
-0.65
POSITIVE LOGITS
're
1.36
guys
1.27
've
1.09
gotta
1.06
know
1.04
wanna
1.03
bastard
1.00
idiots
0.99
deserve
0.96
idiot
0.96
Activations Density 0.131%