INDEX
Explanations
words related to injustice or unfairness
terms associated with unfairness or injustice
New Auto-Interp
Negative Logits
gdala
-0.88
ince
-0.81
udder
-0.77
audi
-0.76
acid
-0.76
apse
-0.74
incinn
-0.72
asio
-0.71
hent
-0.71
aeda
-0.71
POSITIVE LOGITS
unfair
0.90
nesses
0.88
burdens
0.84
dismissal
0.82
prejudice
0.81
burden
0.78
Advantage
0.75
undermin
0.72
unfairly
0.72
disadvantages
0.72
Activations Density 0.043%