INDEX
Explanations
phrases expressing the absence or lack of something
New Auto-Interp
Negative Logits
Yet
-0.23
Yet
-0.20
yet
-0.19
yet
-0.18
Still
-0.17
dle
-0.16
ugins
-0.16
illo
-0.15
ancora
-0.15
lix
-0.15
POSITIVE LOGITS
but
0.25
than
0.24
short
0.23
other
0.22
less
0.22
hơn
0.19
burg
0.19
OTHER
0.18
BUT
0.18
but
0.17
Activations Density 0.035%