INDEX
Explanations
phrases related to military or police operations
references to various operations or missions
New Auto-Interp
Negative Logits
aez
-0.79
arton
-0.77
hus
-0.76
hner
-0.71
bitious
-0.71
cknowled
-0.66
eree
-0.65
yi
-0.65
ingers
-0.65
haw
-0.64
POSITIVE LOGITS
ally
0.94
ional
0.93
eering
0.92
ality
0.83
undertaken
0.81
ions
0.74
conducted
0.74
ãĥ¢
0.73
Werewolf
0.72
Twist
0.72
Activations Density 0.023%