INDEX
Explanations
complex and nuanced forms of negativity or criticism
New Auto-Interp
Negative Logits
ability
-0.21
isation
-0.21
ization
-0.20
uation
-0.20
lessness
-0.20
eration
-0.20
ivism
-0.20
stration
-0.19
emption
-0.19
urement
-0.19
POSITIVE LOGITS
eworthy
0.30
arious
0.28
acious
0.27
urious
0.27
orous
0.26
ughty
0.25
urous
0.25
volent
0.25
omorphic
0.24
inous
0.24
Activations Density 0.389%