INDEX
Explanations
names related to people
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
PDATE
-0.85
vernment
-0.83
ptives
-0.79
culosis
-0.73
GROUND
-0.72
lder
-0.69
lda
-0.67
FUL
-0.66
firearms
-0.64
pmwiki
-0.63
POSITIVE LOGITS
arine
1.02
ean
0.89
aniel
0.86
ie
0.85
opher
0.83
leen
0.82
ees
0.81
wyn
0.81
imer
0.81
len
0.79
Activations Density 0.014%