INDEX
Explanations
prepositions or relational words
New Auto-Interp
Negative Logits
orous
0.50
opted
0.48
ites
0.46
仕事
0.45
দেখার
0.45
ಅ
0.45
ischen
0.44
eningkatan
0.43
iciens
0.42
intitul
0.42
POSITIVE LOGITS
solely
0.64
towar
0.60
within
0.59
WITHIN
0.57
nejen
0.56
from
0.56
către
0.56
WITHOUT
0.55
toward
0.54
ใน
0.54
Activations Density 0.094%