INDEX
Explanations
names of public figures or individuals
proper nouns, specifically names of people and places
New Auto-Interp
Negative Logits
nesday
-0.81
cule
-0.78
ationally
-0.78
ATIONAL
-0.77
Pats
-0.75
onial
-0.74
GI
-0.69
oso
-0.68
atile
-0.67
alde
-0.66
POSITIVE LOGITS
Kejriwal
1.05
jriwal
0.94
Keefe
0.76
Clarkson
0.73
eters
0.73
ly
0.71
agher
0.69
lers
0.69
loo
0.68
zeb
0.67
Activations Density 0.009%