INDEX
Explanations
mentions of attention or intention in the text
terms related to attention and intention
New Auto-Interp
Negative Logits
Mehran
-0.78
senal
-0.75
accompan
-0.69
crest
-0.68
tribal
-0.66
Juliet
-0.63
Recon
-0.62
shatter
-0.61
Mend
-0.60
succeeding
-0.60
POSITIVE LOGITS
ention
1.05
edly
0.90
aldehyde
0.88
rary
0.88
theless
0.86
ertodd
0.85
estinal
0.85
endment
0.83
rontal
0.83
ally
0.83
Activations Density 0.005%