INDEX
Explanations
verbs associated with decision-making or judgment
New Auto-Interp
Negative Logits
Clockwork
-0.71
otide
-0.69
patiently
-0.69
floated
-0.69
maneu
-0.66
reckoned
-0.65
doubtless
-0.65
towed
-0.62
lease
-0.61
inis
-0.60
POSITIVE LOGITS
disrespect
0.87
enough
0.80
discrimination
0.74
somehow
0.74
unfairly
0.73
anything
0.70
racism
0.70
\'
0.70
discriminatory
0.68
harm
0.68
Activations Density 0.592%