INDEX
Explanations
references to accidents, explosions, and incidents involving disasters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
196
+0.15
0.6%
1416
+0.14
0.5%
1506
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
196
+0.15
0.04
1416
+0.14
0.03
1506
+0.13
0.03
Negative Logits
Anhalt
-0.49
الحره
-0.49
Cot
-0.46
widual
-0.45
הח
-0.45
Campe
-0.44
Trin
-0.44
Benin
-0.44
bernate
-0.44
фициаль
-0.43
POSITIVE LOGITS
explosion
1.08
explosions
1.04
explosion
0.97
explodes
0.96
explode
0.96
ftre
0.95
exploding
0.95
Explo
0.94
fays
0.94
Explosion
0.91
Activations Density 0.092%