INDEX
Explanations
names of political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.12
0.4%
198
+0.09
0.3%
1870
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.12
0.06
915
+0.09
0.04
238
+0.09
0.04
Negative Logits
disreg
-0.88
scrat
-0.88
impra
-0.83
bosch
-0.82
embodi
-0.81
tupperware
-0.80
suscep
-0.79
snoopy
-0.77
lpg
-0.77
overcrow
-0.76
POSITIVE LOGITS
alumínio
0.61
hermoso
0.57
ⓧ
0.57
BioLib
0.56
Dermott
0.55
acrylamide
0.54
Estudiantes
0.53
UserScript
0.52
Gallimard
0.51
Cataluña
0.51
Activations Density 0.319%