INDEX
Explanations
names related to military or political figures
references to specific individuals, particularly in the context of military or government-related discussions
New Auto-Interp
Negative Logits
ting
-0.80
ty
-0.80
ted
-0.77
vice
-0.75
theless
-0.69
ties
-0.68
Ķ
-0.66
Spielberg
-0.64
istic
-0.64
cision
-0.63
POSITIVE LOGITS
aurus
1.09
pread
0.98
hift
0.97
CRIP
0.94
hips
0.92
aeda
0.90
chwitz
0.90
creen
0.89
andra
0.88
cale
0.88
Activations Density 0.052%