INDEX
Explanations
references to social issues and topics related to social justice, inclusion, and community dynamics
New Auto-Interp
Negative Logits
onders
-0.16
endas
-0.15
uyen
-0.15
коÑĤ
-0.15
ulia
-0.15
icari
-0.14
iland
-0.14
idis
-0.14
ogl
-0.14
hazi
-0.13
POSITIVE LOGITS
ware
0.16
proof
0.16
undes
0.14
rooms
0.14
-minded
0.14
/community
0.14
unacceptable
0.14
/people
0.13
Proof
0.13
minded
0.13
Activations Density 0.037%