INDEX
Explanations
phrases related to avoiding something
instances of the word "avoid" and related forms
New Auto-Interp
Negative Logits
iop
-0.81
geist
-0.70
essee
-0.69
rooms
-0.69
cart
-0.67
song
-0.65
Rated
-0.64
otle
-0.64
RAW
-0.63
dy
-0.63
POSITIVE LOGITS
detection
0.79
pitfalls
0.71
vana
0.71
nels
0.71
ably
0.69
ading
0.69
wasting
0.68
ption
0.68
hess
0.67
answering
0.67
Activations Density 0.035%