INDEX
Explanations
phrases related to revolutionaries and social change
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.10
0.3%
1539
+0.07
0.2%
1510
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
610
+0.10
0.06
197
+0.07
0.04
1200
+0.07
0.04
Negative Logits
secon
-1.05
Intere
-1.04
intere
-1.04
doman
-1.02
dispen
-1.01
oner
-1.00
contex
-1.00
maroc
-1.00
squa
-0.98
effe
-0.97
POSITIVE LOGITS
those
0.79
those
0.79
Those
0.77
Those
0.72
whom
0.72
who
0.68
whose
0.68
ones
0.65
الذين
0.64
These
0.63
Activations Density 0.658%