INDEX
Explanations
comparisons that highlight differences in ability or circumstances among groups
New Auto-Interp
Negative Logits
lediÄŁi
-0.15
enne
-0.14
ButtonText
-0.14
ÎĶημο
-0.14
_ASCII
-0.14
GGLE
-0.14
qua
-0.13
žÃŃ
-0.13
μÏīÏĤ
-0.13
λικ
-0.13
POSITIVE LOGITS
those
0.58
those
0.50
Those
0.47
Those
0.45
éĤ£äºĽ
0.42
ones
0.39
ceux
0.34
tÄĽch
0.31
những
0.30
aquel
0.30
Activations Density 0.294%