INDEX
Explanations
phrases related to political ideologies and extremist groups
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.12
0.4%
478
+0.12
0.4%
678
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1624
+0.12
0.02
678
+0.12
0.03
757
+0.12
0.02
Negative Logits
And
-0.60
less
-0.58
Of
-0.57
<bos>
-0.57
No
-0.57
But
-0.57
At
-0.56
As
-0.56
BeforeAll
-0.56
My
-0.56
POSITIVE LOGITS
effe
1.62
wien
1.59
increa
1.53
deleter
1.51
suspic
1.50
aen
1.49
sovere
1.49
nece
1.49
fatis
1.49
pessi
1.48
Activations Density 0.068%