INDEX
Explanations
words related to negative or critical attitudes towards someone or something
terms expressing negative attitudes towards people or concepts
New Auto-Interp
Negative Logits
hemor
-0.75
icle
-0.68
Lans
-0.67
destruct
-0.66
helicop
-0.63
icles
-0.61
raine
-0.60
ramid
-0.60
Staten
-0.59
reorgan
-0.58
POSITIVE LOGITS
acy
0.86
FUL
0.85
fully
0.83
chery
0.82
fulness
0.81
uous
0.79
ately
0.78
ability
0.78
lessly
0.76
rence
0.76
Activations Density 0.040%