INDEX
Explanations
references to the word "girl" in various forms and contexts
New Auto-Interp
Negative Logits
-uri
-0.15
ÑıÑħ
-0.15
ined
-0.15
Ïīμα
-0.15
mark
-0.15
fog
-0.15
lom
-0.14
Wayback
-0.14
cx
-0.14
racat
-0.14
POSITIVE LOGITS
affe
0.36
aff
0.28
AFF
0.23
affer
0.22
Gir
0.21
ardin
0.21
oux
0.20
friend
0.20
aud
0.20
gir
0.20
Activations Density 0.005%