INDEX
Explanations
instances where "pay" or "paying attention" is mentioned
New Auto-Interp
Negative Logits
anka
-0.08
MN
-0.07
ILED
-0.07
æ¿
-0.07
chen
-0.07
plex
-0.07
redicate
-0.07
ogl
-0.06
rip
-0.06
ouz
-0.06
POSITIVE LOGITS
ırak
0.07
skirts
0.07
closely
0.07
carefully
0.07
OWL
0.06
459
0.06
agus
0.06
789
0.06
oulouse
0.06
attent
0.06
Activations Density 0.004%