INDEX
Explanations
phrases related to news articles, political figures, and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.18
0.5%
1343
+0.16
0.5%
690
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1996
+0.18
0.03
1003
+0.16
0.03
1018
+0.10
0.03
Negative Logits
swarovski
-0.82
Septembre
-0.70
Rgds
-0.69
bahay
-0.68
Octobre
-0.68
Ename
-0.66
haup
-0.63
tanong
-0.63
considération
-0.63
maraming
-0.63
POSITIVE LOGITS
Images
0.72
Image
0.60
images
0.59
image
0.58
Images
0.57
images
0.55
IMAGES
0.55
Photos
0.55
Image
0.54
Photographer
0.51
Activations Density 0.066%