INDEX
Explanations
auxiliary verbs, particularly forms of "did" and "does"
New Auto-Interp
Negative Logits
seau
-0.16
athan
-0.15
never
-0.15
ád
-0.15
unga
-0.14
никогда
-0.14
è©
-0.14
oster
-0.14
aybe
-0.14
nunca
-0.14
POSITIVE LOGITS
indeed
0.35
everything
0.27
not
0.24
nothing
0.23
inde
0.23
Indeed
0.22
actic
0.21
Indeed
0.20
what
0.20
exactly
0.20
Activations Density 0.085%