INDEX
Explanations
references to group identities or categories, particularly in discussions about people or organizations
New Auto-Interp
Negative Logits
ÏĦÏĮ
-0.16
ÙĬب
-0.14
.cast
-0.14
اعد
-0.14
lop
-0.14
жа
-0.13
åĨ
-0.13
lick
-0.13
ayet
-0.13
abelle
-0.13
POSITIVE LOGITS
besides
0.18
neck
0.18
than
0.17
318
0.17
Besides
0.16
ello
0.15
Besides
0.15
/ws
0.15
_INCREF
0.15
niż
0.15
Activations Density 0.315%