INDEX
Explanations
references to social justice and advocacy
New Auto-Interp
Negative Logits
amins
-0.15
sesso
-0.14
iego
-0.14
onaut
-0.14
롱
-0.13
_CSR
-0.13
ashire
-0.13
-Americ
-0.13
chner
-0.13
NewItem
-0.13
POSITIVE LOGITS
these
0.30
them
0.29
è¿ĻäºĽ
0.25
these
0.24
These
0.22
THESE
0.22
These
0.20
ÑįÑĤиÑħ
0.20
Them
0.20
tÄĽchto
0.19
Activations Density 0.355%