INDEX
Explanations
instances of verbs or phrases indicating intention or direction
New Auto-Interp
Negative Logits
usher
-0.16
евеÑĢ
-0.15
åŁº
-0.14
lixir
-0.14
adera
-0.14
esian
-0.13
uÄŁ
-0.13
lamaz
-0.13
سر
-0.13
à¸Ľà¸£à¸°à¸Īำ
-0.13
POSITIVE LOGITS
innovate
0.16
967
0.15
live
0.15
å¾®ç¬ij
0.14
Holl
0.14
ucceed
0.14
OTS
0.14
otec
0.14
.chapter
0.14
succeed
0.14
Activations Density 0.566%