INDEX
Explanations
instances of explosions
references to explosions and their impacts
New Auto-Interp
Negative Logits
nda
-0.83
tern
-0.70
lege
-0.70
stra
-0.69
dearly
-0.66
ndra
-0.66
ippi
-0.65
leg
-0.64
iff
-0.64
dden
-0.64
POSITIVE LOGITS
explosion
0.94
furnace
0.84
explosions
0.82
bursting
0.82
furn
0.82
radius
0.80
opl
0.80
hower
0.79
burst
0.77
Explosion
0.76
Activations Density 0.019%