INDEX
Explanations
references to bombs or explosive devices in various contexts
New Auto-Interp
Negative Logits
ceae
-0.17
edb
-0.15
vitae
-0.14
hire
-0.14
lify
-0.14
yte
-0.14
è¯ī
-0.14
lies
-0.14
OF
-0.14
à¥Ģस
-0.14
POSITIVE LOGITS
shell
0.43
arded
0.35
astic
0.32
arding
0.32
ard
0.30
astically
0.26
adier
0.26
shell
0.24
disposal
0.24
ast
0.24
Activations Density 0.010%