INDEX
Explanations
instances of attention and focus in the text
New Auto-Interp
Negative Logits
obel
-0.16
iazza
-0.14
stoff
-0.14
KERNEL
-0.14
uyá»ĩn
-0.14
lops
-0.14
änder
-0.13
/autoload
-0.13
isse
-0.13
PTY
-0.13
POSITIVE LOGITS
We
0.16
endir
0.15
intact
0.15
unker
0.15
Dev
0.15
ometrics
0.14
Bent
0.14
enance
0.14
My
0.14
ude
0.13
Activations Density 0.009%