INDEX
Explanations
phrases related to continuous processes or improvement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.15
0.8%
247
+0.14
0.8%
113
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
448
+0.15
0.01
113
+0.14
0.02
148
+0.11
0.00
Negative Logits
assadors
-1.63
owed
-1.62
convinced
-1.58
chers
-1.54
derer
-1.50
daddy
-1.50
adorable
-1.41
instructed
-1.40
brothers
-1.39
Chief
-1.36
POSITIVE LOGITS
£
2.10
ness
1.90
uration
1.90
eten
1.84
Ļª
1.76
©
1.75
Ł
1.73
Īĺ
1.70
idad
1.70
ität
1.69
Activations Density 0.074%