INDEX
Explanations
references to female characters and their relationships within stories
New Auto-Interp
Negative Logits
fter
-0.17
hi
-0.16
ele
-0.14
elper
-0.14
nge
-0.14
ing
-0.14
hips
-0.14
odem
-0.14
quiv
-0.14
arer
-0.14
POSITIVE LOGITS
afort
0.16
å¡ļ
0.14
//*[@
0.14
Ñįй
0.14
/Branch
0.13
annes
0.13
"urls
0.13
urdu
0.13
617
0.13
AFX
0.13
Activations Density 0.389%