INDEX
Explanations
words related to preventing negative events or outcomes
phrases that emphasize prevention
New Auto-Interp
Negative Logits
ammy
-0.85
night
-0.70
edded
-0.70
eah
-0.69
ingers
-0.62
elt
-0.62
bard
-0.60
geist
-0.60
Truth
-0.60
MON
-0.60
POSITIVE LOGITS
ative
1.01
detection
0.84
inhib
0.83
ively
0.80
regress
0.79
duplicate
0.78
ministic
0.73
auga
0.70
accidental
0.68
obstruct
0.68
Activations Density 0.031%