INDEX
Explanations
adjectives describing low or negative qualities
negative descriptors related to quality or performance
New Auto-Interp
Negative Logits
leans
-0.84
ju
-0.73
plane
-0.72
frey
-0.72
llers
-0.70
cise
-0.70
cript
-0.70
planes
-0.68
alde
-0.68
lean
-0.67
POSITIVE LOGITS
glers
0.88
miser
0.88
luster
0.80
Downs
0.75
rollout
0.68
incompet
0.68
excuses
0.68
nesses
0.68
incompetence
0.68
Spac
0.66
Activations Density 0.085%