INDEX
Explanations
phrases related to accidents, incidents, or safety precautions, particularly in the context of car collisions or workplace accidents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1438
+0.11
0.3%
1967
+0.10
0.3%
1978
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
862
+0.11
0.04
1060
+0.10
0.06
1455
+0.10
0.05
Negative Logits
intersper
-1.91
reluct
-1.81
unspeak
-1.81
increa
-1.72
snoopy
-1.68
hentai
-1.67
milf
-1.65
shenan
-1.65
affor
-1.64
indescri
-1.62
POSITIVE LOGITS
<bos>
1.16
happening
0.84
happen
0.81
occur
0.76
happened
0.74
OCCURRED
0.71
occurred
0.70
occurs
0.69
ocur
0.67
happens
0.67
Activations Density 0.537%