INDEX
Explanations
words associated with negative connotations, including criticism and harm
words related to criticism and negative descriptors
New Auto-Interp
Negative Logits
minster
-0.75
Gardens
-0.68
gow
-0.67
united
-0.64
bnb
-0.64
laun
-0.63
aez
-0.61
etter
-0.60
soDeliveryDate
-0.60
urat
-0.59
POSITIVE LOGITS
ible
0.77
alion
0.76
chio
0.74
tein
0.74
dden
0.73
ction
0.68
fusc
0.68
ione
0.67
iblical
0.66
ected
0.65
Activations Density 0.054%