INDEX
Explanations
adjectives with negative connotations
terms related to negative or harsh descriptors
New Auto-Interp
Negative Logits
ethics
-0.67
FU
-0.64
OTA
-0.60
Uganda
-0.59
ENE
-0.58
Matter
-0.58
RI
-0.58
MU
-0.58
ANS
-0.57
tions
-0.56
POSITIVE LOGITS
uously
1.22
uous
1.21
ingly
1.20
htaking
1.12
igious
1.04
ctic
1.01
amental
0.98
acular
0.96
herent
0.96
uable
0.96
Activations Density 0.095%