INDEX
Explanations
"up" and its associated variations or a sense of increase
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
317
+0.15
0.9%
228
+0.13
0.7%
162
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
41
+0.15
0.03
467
+0.13
0.02
66
+0.11
0.03
Negative Logits
ĥ½
-2.57
Ī
-2.12
©
-1.86
§
-1.86
¯
-1.79
Ń
-1.78
¥
-1.76
ĸ´
-1.65
ĵ
-1.65
ī
-1.63
POSITIVE LOGITS
dates
2.92
grades
2.45
dating
2.12
grade
1.95
ublic
1.91
graded
1.83
hill
1.72
stairs
1.72
ercase
1.67
grad
1.67
Activations Density 0.154%