INDEX
Explanations
words related to evaluation and judgment, such as terms related to performance, cost, policy, efficiency, accountability, fundraising, and bias
terms related to performance, accountability, and other evaluative measures in various contexts
New Auto-Interp
Negative Logits
arnaev
-0.67
ovie
-0.66
aughters
-0.57
ndum
-0.56
Logo
-0.55
ortium
-0.54
orget
-0.54
ERSON
-0.54
uscript
-0.53
Rasmussen
-0.53
POSITIVE LOGITS
considerations
0.77
constraints
0.68
istically
0.68
deprivation
0.67
manipulation
0.66
illary
0.63
aggregation
0.63
cues
0.62
lessly
0.62
purposes
0.62
Activations Density 0.585%