INDEX
Explanations
words related to negative attributes and actions, such as incompetence, stupidity, and thoughtlessness
New Auto-Interp
Negative Logits
amins
-0.81
nesium
-0.74
ergy
-0.72
amen
-0.70
ãĥİ
-0.67
lift
-0.65
izable
-0.64
ellar
-0.60
lyak
-0.60
ITNESS
-0.59
POSITIVE LOGITS
perv
1.06
iness
1.02
inherent
0.95
fulness
0.88
ously
0.84
quot
0.84
bordering
0.82
abound
0.80
ulence
0.80
crept
0.79
Activations Density 0.143%