INDEX
Explanations
names of people
references to individuals named Emily
New Auto-Interp
Negative Logits
ername
-0.77
tenance
-0.75
emonium
-0.73
lasses
-0.70
nown
-0.70
nings
-0.70
ebin
-0.69
nesses
-0.69
animous
-0.68
specificity
-0.68
POSITIVE LOGITS
Dickinson
1.10
Lak
0.98
gdala
0.90
otte
0.87
pton
0.83
endi
0.78
issance
0.75
sburg
0.75
Ago
0.74
ãĤ£
0.73
Activations Density 0.015%