INDEX
Explanations
references to collisions, particularly in the context of accidents or incidents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
148
+0.15
0.9%
111
+0.14
0.8%
502
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
111
+0.15
0.01
112
+0.14
0.01
502
+0.13
0.01
Negative Logits
slightest
-2.33
sting
-1.63
"?"
-1.63
wake
-1.59
itrile
-1.59
latter
-1.57
illed
-1.52
deadly
-1.48
ease
-1.47
amethasone
-1.47
POSITIVE LOGITS
ienn
1.83
ateral
1.72
ariate
1.71
ist
1.70
enson
1.69
ained
1.61
witz
1.61
hart
1.59
agen
1.59
iano
1.57
Activations Density 0.167%