INDEX
Explanations
references to groups of people, including demographics and roles within society
New Auto-Interp
Negative Logits
inci
-0.16
coni
-0.16
obao
-0.15
orge
-0.15
rung
-0.14
angan
-0.13
roti
-0.13
urd
-0.13
prepend
-0.13
illis
-0.13
POSITIVE LOGITS
alike
1.53
equally
0.71
respectively
0.57
ä¸Ģæł·
0.39
respective
0.38
equal
0.37
gleich
0.36
similarly
0.35
igual
0.34
Equal
0.30
Activations Density 0.160%