INDEX
Explanations
the infinitive form of verbs indicating actions or recommendations
New Auto-Interp
Negative Logits
behold
-0.16
aad
-0.16
ange
-0.15
ersh
-0.15
eker
-0.14
audi
-0.14
Coun
-0.14
ught
-0.14
ustry
-0.14
Ñĥй
-0.14
POSITIVE LOGITS
dio
0.16
Kimber
0.14
agrams
0.14
éħ
0.13
éĢĶ
0.13
Dram
0.13
azine
0.13
ãĥĬãĥ«
0.13
izr
0.13
ournals
0.12
Activations Density 0.120%