INDEX
Explanations
female names and roles
This neuron detects personal names—capitalized named entities referring to people.
New Auto-Interp
Negative Logits
handsome
0.35
সুদর্শন
0.33
david
0.32
boyhood
0.31
t
0.31
Himself
0.30
lar
0.30
👔
0.29
நண்ப
0.29
бав
0.28
POSITIVE LOGITS
कुमारी
0.55
Kumari
0.50
actresses
0.49
policewomen
0.48
Louise
0.45
heroine
0.44
Elizabeth
0.44
خاتون
0.44
actress
0.43
María
0.43
Activations Density 0.028%