INDEX
Explanations
phrases indicating types or classifications
New Auto-Interp
Negative Logits
chg
-0.17
somehow
-0.16
remely
-0.16
orem
-0.15
suche
-0.15
seemingly
-0.14
_SIG
-0.14
orie
-0.14
riz
-0.14
nt
-0.14
POSITIVE LOGITS
-sort
0.25
like
0.19
-ÑĤаки
0.17
Like
0.17
/s
0.16
thing
0.16
antity
0.16
LIKE
0.15
thing
0.14
isos
0.14
Activations Density 0.024%