INDEX
Explanations
questions or statements related to government policies and freedom of expression
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
226
+0.10
0.3%
1253
+0.08
0.2%
872
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1181
+0.10
0.04
226
+0.08
0.03
1060
+0.08
0.04
Negative Logits
<bos>
-0.56
Itz
-0.50
forbear
-0.46
poc
-0.45
dora
-0.45
gero
-0.45
plagio
-0.44
ete
-0.44
Galer
-0.43
usus
-0.43
POSITIVE LOGITS
otheby
0.65
">...
0.64
hastly
0.63
USTAIN
0.63
oarece
0.61
anymore
0.60
dyž
0.60
venuto
0.60
sentito
0.58
statunit
0.57
Activations Density 0.422%