INDEX
Explanations
instances of the word "even."
New Auto-Interp
Negative Logits
agan
-0.08
agine
-0.07
elow
-0.06
anytime
-0.06
conte
-0.06
any
-0.06
anywhere
-0.06
set
-0.06
rats
-0.06
bye
-0.06
POSITIVE LOGITS
even
0.08
even
0.07
MORE
0.07
461
0.07
though
0.07
-more
0.07
wel
0.07
MORE
0.06
Though
0.06
EVEN
0.06
Activations Density 0.018%