INDEX
Explanations
intensifiers followed by adjectives or adverbs
New Auto-Interp
Negative Logits
lip
-0.15
sten
-0.15
ustr
-0.15
UBY
-0.14
lot
-0.14
lix
-0.14
too
-0.14
изнеÑģ
-0.14
plit
-0.14
ZY
-0.14
POSITIVE LOGITS
Ïĩε
0.15
Ñĩи
0.15
-*-č↵
0.14
ething
0.14
ythe
0.14
untu
0.13
.documentation
0.13
alara
0.13
lingen
0.13
much
0.13
Activations Density 0.018%