INDEX
Explanations
terms related to riots and violence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.14
0.4%
1861
+0.09
0.3%
1942
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
964
+0.14
0.04
1782
+0.09
0.04
1556
+0.09
0.03
Negative Logits
<bos>
-0.83
boop
-0.81
affez
-0.80
sento
-0.80
luigi
-0.80
trovo
-0.79
logitech
-0.76
imgur
-0.76
hasbro
-0.76
wikihow
-0.75
POSITIVE LOGITS
riots
0.80
riot
0.69
unrest
0.61
violence
0.59
uprising
0.58
erupted
0.55
riot
0.53
disturbances
0.52
demonstrations
0.51
Riot
0.51
Activations Density 0.307%