INDEX
Explanations
references to specific numbers or numerical figures
New Auto-Interp
Negative Logits
5
-0.96
4
-0.88
7
-0.88
3
-0.86
6
-0.81
9
-0.80
8
-0.77
0
-0.72
2
-0.71
1
-0.68
POSITIVE LOGITS
Seven
2.16
Nine
2.13
Seven
2.12
Nine
2.06
Six
2.05
nine
2.02
Eight
1.97
eight
1.96
Six
1.95
seven
1.93
Activations Density 0.108%