INDEX
Explanations
adverbs and adjectives describing frequency or state
New Auto-Interp
Negative Logits
ught
-0.07
annis
-0.06
itu
-0.06
oret
-0.06
dut
-0.06
kening
-0.06
succes
-0.06
via
-0.06
uren
-0.06
ammen
-0.05
POSITIVE LOGITS
undecided
0.07
нима
0.07
हल
0.07
enant
0.07
jÃŃ
0.07
åı·
0.06
schö
0.06
imoto
0.06
nouve
0.06
оÑıÑĤ
0.06
Activations Density 0.001%