INDEX
Explanations
phrases or words related to secrecy or classified information
references to secrecy and covert operations
New Auto-Interp
Negative Logits
gaard
-0.85
annis
-0.85
avers
-0.77
gars
-0.73
©¶æ
-0.73
adjusted
-0.72
merce
-0.70
thood
-0.69
chairs
-0.68
older
-0.67
POSITIVE LOGITS
secret
0.96
ariat
0.95
arial
0.94
rets
0.87
Secret
0.86
ingredient
0.83
secret
0.81
ballot
0.79
secrets
0.78
unfocusedRange
0.77
Activations Density 0.014%