INDEX
Explanations
phrases related to political commentary and societal issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.10
0.3%
674
+0.08
0.2%
599
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
357
+0.10
0.04
1159
+0.08
0.02
734
+0.07
0.03
Negative Logits
Juf
-1.08
secon
-1.06
dises
-1.05
aen
-1.04
compen
-1.03
squa
-1.00
guarante
-1.00
aton
-1.00
mef
-0.99
stockholm
-0.99
POSITIVE LOGITS
determines
1.02
depends
0.93
matters
0.87
determine
0.83
affects
0.79
decides
0.78
mattered
0.75
varies
0.75
matter
0.74
influences
0.71
Activations Density 0.370%