INDEX
Explanations
the presence of specific prepositions and adverbs indicating spatial or temporal relationships
New Auto-Interp
Negative Logits
upp
-0.15
ats
-0.14
ond
-0.14
Harden
-0.14
iflower
-0.13
urd
-0.13
ils
-0.13
ãĥķãĤ
-0.13
ÑĤÑĥ
-0.13
Dickinson
-0.13
POSITIVE LOGITS
bens
0.18
vron
0.15
uxe
0.15
inis
0.14
Conte
0.14
zbek
0.13
vik
0.13
ensi
0.13
etiqu
0.13
odate
0.13
Activations Density 0.231%