INDEX
Explanations
phrases indicating approximate quantities or numbers
New Auto-Interp
Negative Logits
pes
-0.17
orsi
-0.16
isci
-0.15
$self
-0.15
hey
-0.15
idy
-0.14
OOT
-0.14
asil
-0.14
Ñĩа
-0.14
ãĥ³ãĥIJ
-0.14
POSITIVE LOGITS
dozen
0.18
;element
0.16
150
0.14
600
0.14
erto
0.13
akk
0.13
lier
0.13
avel
0.13
Spurs
0.13
íĥĪ
0.13
Activations Density 0.049%