INDEX
Explanations
references to military ranks or titles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.22
1.3%
396
+0.18
1.0%
325
+0.15
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
156
+0.22
0.03
325
+0.18
0.03
433
+0.15
0.03
Negative Logits
ŀ
-2.85
¬
-2.50
ĺ
-2.45
ĵ
-2.43
ĥ½
-2.29
Ĵ
-2.24
Ŀ
-2.20
Ļ
-2.20
ĸ
-2.19
ģ
-2.18
POSITIVE LOGITS
liness
2.26
ships
1.97
abbit
1.81
glut
1.79
ly
1.78
ily
1.77
zzle
1.75
intendent
1.72
theless
1.71
ship
1.70
Activations Density 0.080%