INDEX
Explanations
pronouns and verbs related to political figures and actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1919
+0.19
0.6%
674
+0.12
0.4%
381
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.19
0.11
862
+0.12
0.04
1415
+0.10
0.04
Negative Logits
dégust
-1.19
Campionato
-1.05
répon
-1.05
carrefour
-1.03
autunno
-1.03
cioc
-1.02
pietre
-1.01
bicic
-1.00
parlamento
-0.99
soigne
-0.98
POSITIVE LOGITS
want
0.69
wanted
0.69
would
0.68
believe
0.67
had
0.65
regretted
0.64
knew
0.64
felt
0.63
intend
0.63
couldn
0.62
Activations Density 0.240%