INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
eleven
1.03
sixteen
0.99
eighteen
0.97
Sixteen
0.96
thirteen
0.94
seventeen
0.94
fourteen
0.91
eleventh
0.90
twelve
0.89
Fourteen
0.89
POSITIVE LOGITS
3
1.92
4
1.81
5
1.70
2
1.54
6
1.53
7
1.51
9
1.44
8
1.44
1
1.33
0
1.13
Activations Density 1.282%