INDEX
Explanations
descriptive adjectives followed by nouns
New Auto-Interp
Negative Logits
смысле
0.32
quantifying
0.31
່ວນ
0.31
রাষ্ট্রীয়
0.30
ाइवेट
0.29
信仰
0.29
寒い
0.29
prinsip
0.29
fungsi
0.29
这个
0.29
POSITIVE LOGITS
,
0.31
plywood
0.31
bakery
0.29
models
0.29
-
0.27
walnut
0.27
hotel
0.27
ened
0.27
motorcycle
0.26
livestock
0.26
Activations Density 0.613%