INDEX
Explanations
names of specific individuals
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
IUM
-0.85
SHARE
-0.81
SHARES
-0.76
INST
-0.71
OFFIC
-0.68
FUN
-0.68
NEXT
-0.68
SIZE
-0.66
FACE
-0.63
FACE
-0.63
POSITIVE LOGITS
illard
1.06
arre
0.96
atson
0.90
quart
0.88
ennett
0.86
oldemort
0.85
opez
0.84
axter
0.83
iggs
0.82
arrett
0.81
Activations Density 0.089%