INDEX
Explanations
usage of prepositions and words indicating change or transformation
New Auto-Interp
Negative Logits
缮
-0.15
orida
-0.15
loff
-0.15
κοÏĤ
-0.15
ocs
-0.15
Å¡nÃŃ
-0.14
ÑĥÑĢÑģ
-0.14
linky
-0.14
å±Ĭ
-0.14
iaux
-0.14
POSITIVE LOGITS
243
0.16
ones
0.15
бÑĥ
0.15
gen
0.14
bli
0.14
pin
0.14
ecome
0.14
instead
0.14
shower
0.14
sar
0.13
Activations Density 0.185%