INDEX
Explanations
images mentioned in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.13
0.4%
872
+0.09
0.3%
1856
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1856
+0.13
0.07
1018
+0.09
0.03
678
+0.09
0.05
Negative Logits
Joaqu
-0.84
jorge
-0.80
Asunción
-0.80
Valentín
-0.80
silikon
-0.79
kompati
-0.79
ricardo
-0.79
Áng
-0.78
Meksi
-0.77
alkoh
-0.77
POSITIVE LOGITS
image
0.83
images
0.77
image
0.74
img
0.67
Image
0.65
Image
0.63
imagen
0.60
images
0.60
photograph
0.59
photo
0.59
Activations Density 0.594%