INDEX
Explanations
references to explosive devices or bombs
New Auto-Interp
Negative Logits
oldem
-0.17
yte
-0.16
ernals
-0.16
ful
-0.15
yt
-0.14
ofday
-0.14
ymous
-0.14
.Inf
-0.14
.GroupLayout
-0.14
INF
-0.14
POSITIVE LOGITS
shell
0.30
arded
0.27
arding
0.26
ard
0.22
astic
0.21
astically
0.20
bomb
0.19
ination
0.18
(shell
0.18
remen
0.18
Activations Density 0.010%