INDEX
Explanations
adverbs describing efficiency or quality of action
New Auto-Interp
Negative Logits
elight
-0.16
à¹ĭ
-0.15
Duch
-0.15
PURE
-0.14
вов
-0.14
ê°Ļ
-0.14
ViewInit
-0.14
ÑĢаб
-0.13
obot
-0.13
Budd
-0.13
POSITIVE LOGITS
ologically
0.18
yo
0.16
uluk
0.16
ingly
0.15
issa
0.14
à¹Ĩ
0.14
363
0.14
ürk
0.14
ropp
0.14
igh
0.14
Activations Density 0.188%