INDEX
Explanations
references to mathematical sections and theorems within academic texts
New Auto-Interp
Negative Logits
ÙĤØ·
-0.15
ioni
-0.14
inç
-0.14
ë°ľ
-0.14
ulumi
-0.14
ekl
-0.13
리ìĬ¤
-0.13
->$
-0.13
ãĥ³ãĥIJ
-0.13
δÎŃ
-0.13
POSITIVE LOGITS
2
0.43
3
0.42
4
0.40
1
0.37
5
0.35
6
0.35
7
0.33
8
0.30
9
0.29
10
0.24
Activations Density 0.318%