INDEX
Explanations
instances where individuals or groups are being treated in a certain way
comparisons involving treatment or perception of individuals or groups
New Auto-Interp
Negative Logits
Rivals
-0.77
obser
-0.77
Riding
-0.74
oln
-0.73
Archdemon
-0.72
oras
-0.70
chev
-0.69
Seeking
-0.69
Yon
-0.67
confir
-0.65
POSITIVE LOGITS
objectively
0.74
critically
0.74
invasive
0.71
differently
0.71
limits
0.68
ocratic
0.68
bestos
0.67
atro
0.66
religiously
0.66
seriously
0.65
Activations Density 0.094%