INDEX
Explanations
references to specific countries or discussions of national contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.13
0.8%
397
+0.11
0.6%
28
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
145
+0.13
0.03
119
+0.11
0.01
56
+0.10
0.00
Negative Logits
ĥ½
-2.71
↵
-2.52
↵
-2.52
↵
-2.52
-2.52
-2.52
↵↵
-2.52
↵
-2.52
↵
-2.52
-2.52
POSITIVE LOGITS
wide
2.43
men
2.19
yard
1.93
boat
1.67
amer
1.65
passer
1.64
mega
1.63
's
1.63
arro
1.61
man
1.61
Activations Density 0.129%