INDEX
Explanations
proper nouns related to military, politics, and international events
New Auto-Interp
Negative Logits
hire
-1.02
lez
-0.86
etr
-0.85
ki
-0.76
icular
-0.74
engers
-0.74
ements
-0.72
hement
-0.71
wikipedia
-0.71
eways
-0.69
POSITIVE LOGITS
ufact
0.86
otive
0.86
Bates
0.74
essage
0.73
Coleman
0.71
Reed
0.68
oute
0.66
FG
0.66
ONEY
0.64
parts
0.64
Activations Density 8.540%