INDEX
Explanations
comparative phrases emphasizing quantities or measures
New Auto-Interp
Negative Logits
sta
-0.15
529
-0.15
Weaver
-0.15
oppel
-0.15
³
-0.14
lsen
-0.14
kowski
-0.14
tamp
-0.14
anches
-0.14
acman
-0.14
POSITIVE LOGITS
enough
0.20
ãĥĥãĥĦ
0.18
óng
0.16
ubb
0.16
евеÑĢ
0.15
rama
0.14
ãĥ¼ãĥľ
0.14
'gc
0.14
icone
0.14
bef
0.14
Activations Density 0.064%