INDEX
Explanations
instances of specific names, particularly those of female characters or notable women
New Auto-Interp
Negative Logits
itas
-0.16
duino
-0.15
ãĥŃãĥ³
-0.15
Bernstein
-0.15
tach
-0.15
Sınıf
-0.15
built
-0.14
abort
-0.14
æ²»
-0.14
kup
-0.14
POSITIVE LOGITS
duct
0.16
-medium
0.16
Pent
0.15
ductor
0.14
ần
0.14
ENTE
0.14
lược
0.14
ente
0.14
Toe
0.14
iska
0.14
Activations Density 0.013%