INDEX
Explanations
mentions of social or political commentary
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.8%
1741
+0.18
0.6%
2019
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.23
0.09
50
+0.18
0.06
1056
+0.15
0.06
Negative Logits
vété
-1.05
noël
-0.97
maksi
-0.93
oeil
-0.92
Secrétaire
-0.90
africains
-0.89
bottes
-0.88
naer
-0.88
malheureux
-0.87
géant
-0.86
POSITIVE LOGITS
importance
0.87
extent
0.78
impact
0.74
possibility
0.74
astéro
0.73
Mitä
0.72
exact
0.71
same
0.70
difference
0.69
ideolog
0.68
Activations Density 0.424%