INDEX
Explanations
terms related to unfairness or injustice
New Auto-Interp
Negative Logits
abito
-0.55
MessageState
-0.54
hood
-0.52
livejournal
-0.52
يتيمه
-0.52
vestibule
-0.52
initializeApp
-0.50
Hause
-0.49
frigor
-0.48
Tobin
-0.48
POSITIVE LOGITS
unfair
1.78
unfairly
1.25
unjust
1.04
injust
1.01
unjustly
0.79
injustice
0.75
injus
0.75
injustices
0.69
inequ
0.65
unequal
0.65
Activations Density 0.007%