INDEX
Explanations
references to numerical values and their relationships in text
New Auto-Interp
Negative Logits
sonian
-0.17
ÃŃrk
-0.16
dfd
-0.15
LENG
-0.15
auce
-0.14
ftime
-0.14
bart
-0.14
Ã¶ÄŁ
-0.14
-sama
-0.14
ovit
-0.14
POSITIVE LOGITS
/or
0.22
higher
0.19
possibly
0.17
respect
0.17
000
0.17
and
0.17
amp
0.17
optionally
0.17
again
0.16
latter
0.16
Activations Density 0.126%