INDEX
Explanations
references to specific newspapers
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1961
+0.11
0.3%
1604
+0.09
0.3%
31
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
300
+0.11
0.02
1363
+0.09
0.02
1075
+0.08
0.02
Negative Logits
pinch
-0.91
simpel
-0.70
Pinch
-0.67
Tikang
-0.67
DoubleQuotes
-0.66
kram
-0.65
<bos>
-0.63
Pinch
-0.61
sement
-0.61
KELEY
-0.61
POSITIVE LOGITS
Herald
2.79
Herald
2.40
herald
1.93
herald
1.69
ERALD
0.96
heral
0.92
considér
0.86
montrant
0.83
bumper
0.83
clô
0.75
Activations Density 0.188%