INDEX
Explanations
mentions of violent incidents such as shootings and attacks
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.10
0.3%
348
+0.07
0.2%
1399
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
297
+0.10
0.05
1249
+0.07
0.02
415
+0.06
0.03
Negative Logits
effe
-1.03
wien
-1.00
?...
-0.96
strick
-0.95
!...
-0.94
reluct
-0.93
affor
-0.90
suscep
-0.90
erad
-0.89
vhs
-0.88
POSITIVE LOGITS
month
0.62
week
0.61
consecu
0.57
consecutive
0.57
within
0.55
month
0.55
Kelebihan
0.54
película
0.53
Imágenes
0.52
doña
0.51
Activations Density 0.299%