INDEX
Explanations
the word "Instead of" followed by an action that goes against the expected or traditional response
the phrase "instead of."
New Auto-Interp
Negative Logits
Nap
-0.68
essen
-0.66
erto
-0.66
nonetheless
-0.65
read
-0.64
ENE
-0.64
Palestin
-0.63
nevertheless
-0.63
artisan
-0.63
veter
-0.62
POSITIVE LOGITS
anything
0.82
bothering
0.74
relying
0.74
ours
0.73
being
0.72
sul
0.72
dwelling
0.72
excuses
0.70
focusing
0.68
outright
0.67
Activations Density 0.033%