INDEX
Explanations
phrases expressing contrast or emphasis on a particular aspect
negative evaluations and criticisms of actions or situations
New Auto-Interp
Negative Logits
nonetheless
-0.93
etheless
-0.73
nevertheless
-0.66
cheat
-0.63
stress
-0.62
Finally
-0.61
Minotaur
-0.61
Lastly
-0.61
ague
-0.58
optional
-0.58
POSITIVE LOGITS
aesthetics
0.66
pian
0.59
paycheck
0.59
ifiable
0.59
erves
0.57
coasts
0.56
aesthetic
0.56
but
0.56
nor
0.55
guiActive
0.55
Activations Density 0.204%