INDEX
Explanations
references to boys and their relationships or interactions
New Auto-Interp
Negative Logits
onica
-0.20
lej
-0.17
viso
-0.16
598
-0.16
$MESS
-0.15
etal
-0.15
uzzer
-0.15
raphics
-0.15
мов
-0.15
paged
-0.15
POSITIVE LOGITS
friends
0.31
/man
0.27
friend
0.26
/g
0.25
hood
0.25
nton
0.24
ish
0.23
-girl
0.23
cout
0.22
ishly
0.21
Activations Density 0.042%