INDEX
Explanations
phrases related to expressing contradiction or disagreement
New Auto-Interp
Negative Logits
around
-0.68
pal
-0.68
rig
-0.66
shifts
-0.65
troop
-0.63
wagon
-0.63
specialist
-0.60
iday
-0.60
himself
-0.59
Cu
-0.58
POSITIVE LOGITS
Not
3.16
not
1.93
Not
1.88
NOT
1.79
Never
1.63
Nothing
1.57
Probably
1.53
Already
1.53
Only
1.50
Except
1.49
Activations Density 0.012%