INDEX
Explanations
mentions of tasks and the processes associated with them
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.20
1.2%
156
+0.19
1.1%
95
+0.15
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.20
0.03
295
+0.19
0.02
117
+0.15
0.02
Negative Logits
¿½
-2.37
IJ
-2.35
§
-2.23
ı
-2.13
Ħ
-2.10
ĨĴ
-1.99
á̝
-1.95
Ĵ
-1.95
Ī
-1.91
Ń
-1.90
POSITIVE LOGITS
master
2.21
horse
1.92
ings
1.85
ahead
1.83
done
1.79
ing
1.77
kill
1.69
requiring
1.59
etable
1.58
processor
1.54
Activations Density 0.137%