INDEX
Explanations
phrases related to rights or policies affecting specific groups of people
references to marginalized or specific groups of people
New Auto-Interp
Negative Logits
ob
-0.76
ILY
-0.67
enegger
-0.65
ointment
-0.63
Hang
-0.63
âĺħâĺħ
-0.62
Serpent
-0.62
irth
-0.61
OB
-0.61
bang
-0.61
POSITIVE LOGITS
wishing
0.98
who
0.87
kinds
0.84
pesky
0.84
interested
0.83
affected
0.77
attending
0.77
surveyed
0.74
advocating
0.74
victimized
0.73
Activations Density 0.068%