INDEX
Explanations
various forms of comparative or inclusive language
other categories
New Auto-Interp
Negative Logits
المعيارى
-0.78
AssemblyTitle
-0.77
otomatig
-0.72
ब्रेकडाउन
-0.71
للمعارف
-0.67
Билгалдахарш
-0.65
PullParser
-0.63
autorytatywna
-0.62
CanadaChoose
-0.61
UnusedPrivate
-0.61
POSITIVE LOGITS
other
0.46
others
0.40
others
0.37
other
0.34
다른
0.30
Öffentlichkeit
0.30
他の
0.29
よう
0.29
другие
0.28
otros
0.28
Activations Density 0.041%