INDEX
Explanations
terms related to injustice or unfairness
words associated with injustice and unfair treatment
New Auto-Interp
Negative Logits
gdala
-0.95
yss
-0.80
á
-0.76
phrine
-0.74
aeda
-0.73
berman
-0.70
Downloadha
-0.70
Enhancement
-0.70
livest
-0.68
zzo
-0.67
POSITIVE LOGITS
ified
1.47
ifiable
1.46
ifier
0.97
ifiers
0.94
nesses
0.90
ification
0.86
ifying
0.86
IFIED
0.85
ly
0.84
ious
0.84
Activations Density 0.021%