INDEX
Explanations
references to political figures, specifically the president
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.17
1.0%
376
+0.12
0.7%
365
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
133
+0.17
0.03
202
+0.12
0.03
365
+0.12
0.02
Negative Logits
ĥ½
-2.42
ŀ
-1.80
Ĥ
-1.74
¢
-1.72
ĵ
-1.69
ij
-1.69
¬
-1.68
backs
-1.66
Respondents
-1.60
Ĵ
-1.60
POSITIVE LOGITS
doms
1.92
coat
1.91
zilla
1.74
brush
1.71
sheet
1.70
liness
1.65
esses
1.63
ee
1.59
urally
1.59
们
1.58
Activations Density 0.021%