INDEX
Explanations
words related to suppressing or quelling actions or emotions
New Auto-Interp
Negative Logits
anuts
-0.84
inav
-0.77
ammy
-0.75
abet
-0.75
olds
-0.74
olf
-0.71
better
-0.71
eday
-0.70
optional
-0.70
nih
-0.69
POSITIVE LOGITS
dissent
0.97
distractions
0.88
impulses
0.84
hostilities
0.84
havoc
0.83
pests
0.83
dissenting
0.82
weeds
0.81
temptation
0.80
intr
0.80
Activations Density 0.180%