INDEX
Explanations
phrases related to the treatment of individuals by the justice system, particularly emphasizing respect, dignity, and fairness
New Auto-Interp
Negative Logits
inet
-0.69
Ellison
-0.65
Ops
-0.63
aer
-0.62
sky
-0.62
zyme
-0.62
brainstorm
-0.61
inx
-0.58
ova
-0.57
laun
-0.57
POSITIVE LOGITS
unequ
0.85
riors
0.74
bars
0.73
istine
0.70
terness
0.69
ilitating
0.69
respectfully
0.68
entious
0.68
unfairly
0.67
iquette
0.66
Activations Density 0.775%