INDEX
Explanations
instances of the word "Up"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
457
+0.12
0.7%
376
+0.12
0.7%
475
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
475
+0.12
0.02
328
+0.12
0.01
248
+0.12
0.01
Negative Logits
Ĺ
-2.78
ĥ½
-2.62
·¸
-2.18
Ń
-2.14
Īĺ
-2.13
¼
-1.97
ĸ´
-1.92
¯
-1.90
ĺ
-1.82
©
-1.78
POSITIVE LOGITS
grade
1.98
dating
1.85
dates
1.69
hill
1.68
graded
1.65
"}](#
1.62
minster
1.60
ÅĽci
1.49
Rapids
1.47
buntu
1.47
Activations Density 0.015%