INDEX
Explanations
infinitive verbs indicating actions or states of being
New Auto-Interp
Negative Logits
llib
-0.15
ừa
-0.15
anja
-0.15
soon
-0.14
inks
-0.14
annot
-0.14
shire
-0.14
idan
-0.14
witch
-0.14
olen
-0.14
POSITIVE LOGITS
ago
0.17
tü
0.17
Eb
0.15
gers
0.15
yms
0.15
gos
0.15
à¹Ģà¸Ħย
0.14
/original
0.14
(before
0.14
akis
0.14
Activations Density 0.026%