INDEX
Explanations
descriptions of incidents related to suicide and its aftermath
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.10
0.3%
575
+0.08
0.2%
1063
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
256
+0.10
0.03
575
+0.08
0.03
1571
+0.07
0.02
Negative Logits
👆
-0.60
Oboe
-0.58
?«
-0.57
!«
-0.57
dci
-0.57
chev
-0.55
„,
-0.55
blos
-0.54
*^
-0.54
Græ
-0.53
POSITIVE LOGITS
suicide
1.08
suicidal
0.90
suicides
0.89
suicide
0.87
Suicide
0.85
sarili
0.84
Suicide
0.82
suic
0.80
自殺
0.70
Bekasi
0.69
Activations Density 0.234%