INDEX
Explanations
mentions of the word "boy" or related variations and contexts
New Auto-Interp
Negative Logits
wich
-0.17
ermen
-0.17
.psi
-0.16
emen
-0.16
aker
-0.16
estre
-0.15
ITTER
-0.15
poons
-0.15
nop
-0.15
eda
-0.15
POSITIVE LOGITS
friend
0.24
Scout
0.24
scout
0.23
Scouts
0.22
friends
0.21
Friend
0.20
scouts
0.20
hood
0.20
toy
0.19
riend
0.19
Activations Density 0.016%