INDEX
Explanations
names of politicians or public figures
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
ational
-0.84
utation
-0.79
Magikarp
-0.77
lain
-0.77
utations
-0.74
arians
-0.72
icist
-0.72
ername
-0.70
hement
-0.70
ancies
-0.69
POSITIVE LOGITS
church
0.90
Bates
0.89
bats
0.78
Bee
0.74
ples
0.70
Lyons
0.68
wolves
0.66
BLE
0.65
Hoo
0.64
Strikes
0.64
Activations Density 0.024%