INDEX
Explanations
phrases that reference groups of people or collective identities
New Auto-Interp
Negative Logits
ConstraintMaker
-0.52
LikeLike
-0.48
Mec
-0.48
frage
-0.46
opensource
-0.46
Nå
-0.45
mány
-0.44
dotta
-0.44
Köszönöm
-0.44
two
-0.44
POSITIVE LOGITS
którzy
0.80
quienes
0.80
coloro
0.78
Others
0.71
Others
0.71
setVerticalGroup
0.70
others
0.67
osoever
0.66
คนที่
0.66
ktorí
0.66
Activations Density 0.112%