INDEX
Explanations
references to female characters or entities in a narrative context
New Auto-Interp
Negative Logits
-
-0.18
thon
-0.16
Tage
-0.16
/
-0.14
cult
-0.14
la
-0.14
giorno
-0.14
b
-0.13
coefficient
-0.13
dolore
-0.13
POSITIVE LOGITS
zos
0.18
misma
0.17
academia
0.16
mgr
0.15
Glover
0.15
λεκ
0.15
quelle
0.15
shal
0.15
enos
0.15
undry
0.15
Activations Density 0.054%