INDEX
Explanations
numerical references and citations in academic texts
New Auto-Interp
Negative Logits
95
-0.16
edge
-0.16
71
-0.15
92
-0.15
83
-0.15
EDGE
-0.15
격
-0.15
Edge
-0.15
deo
-0.15
uga
-0.15
POSITIVE LOGITS
000
0.42
001
0.37
002
0.36
003
0.36
004
0.35
005
0.33
006
0.29
007
0.29
008
0.27
009
0.23
Activations Density 0.060%