INDEX
Explanations
phrases expressing preference for undesirable alternatives over social norms or conventions
New Auto-Interp
Negative Logits
kív
-0.45
SizeF
-0.44
למע
-0.41
including
-0.40
toContain
-0.40
kres
-0.39
sauvages
-0.39
foglal
-0.38
Hals
-0.38
addCriterion
-0.38
POSITIVE LOGITS
}}"></
0.83
VersionUID
0.82
lenker
0.80
متعلقه
0.77
than
0.75
ujednoznacz
0.74
sidemargin
0.71
createContext
0.70
THAN
0.69
Hozzáférés
0.69
Activations Density 0.159%