INDEX
Explanations
specific numeric values or quantities in the text
New Auto-Interp
Negative Logits
vented
-0.15
...↵
-0.14
aña
-0.14
ptest
-0.14
Miracle
-0.14
perl
-0.13
ipur
-0.13
curring
-0.13
Adam
-0.13
Per
-0.13
POSITIVE LOGITS
000
0.26
Û°Û°Û°
0.21
600
0.17
;element
0.16
500
0.16
ousand
0.16
ylan
0.15
_TRI
0.15
ToSelector
0.15
-FIRST
0.14
Activations Density 0.076%