INDEX
Explanations
terms related to destructive events, especially explosions
instances of the word "explosion."
New Auto-Interp
Negative Logits
lege
-0.83
nda
-0.77
leg
-0.75
jud
-0.75
iff
-0.67
tern
-0.66
vag
-0.64
stra
-0.64
aunder
-0.63
abor
-0.61
POSITIVE LOGITS
explosion
0.97
furnace
0.87
explosions
0.87
ignition
0.84
detonated
0.83
explodes
0.82
Explosion
0.82
hower
0.82
furn
0.78
fireball
0.76
Activations Density 0.026%