INDEX
Explanations
male names followed by surnames or titles
the presence of person names (named-entity tokens identifying people).
New Auto-Interp
Negative Logits
petals
0.36
bruises
0.35
Elektrokh
0.34
Justiça
0.32
actresses
0.32
hazelnuts
0.32
breasts
0.31
Yatha
0.31
ERROR
0.31
SAMP
0.31
POSITIVE LOGITS
son
0.37
í
0.36
ina
0.35
ian
0.35
ů
0.34
िया
0.34
å
0.34
-
0.33
о
0.32
sson
0.32
Activations Density 0.078%