INDEX
Explanations
phrases related to how individuals are treated by others
instances of people being treated in various ways, particularly in terms of respect and dignity
New Auto-Interp
Negative Logits
circulation
-0.65
linking
-0.64
prediction
-0.62
Swing
-0.60
yrus
-0.58
Recall
-0.56
kb
-0.56
predictor
-0.55
utherford
-0.55
marks
-0.55
POSITIVE LOGITS
differently
1.15
harshly
1.02
unfairly
0.99
respectfully
0.99
kindly
0.96
unequ
0.94
humane
0.91
hosp
0.86
accordingly
0.85
favorably
0.83
Activations Density 0.185%