INDEX
Explanations
instances where attention is being paid or drawn to specific subjects or details
references to attention and its significance
New Auto-Interp
Negative Logits
lins
-0.67
Tale
-0.66
nia
-0.65
Rebell
-0.61
Tycoon
-0.61
mos
-0.61
Mehran
-0.60
testament
-0.60
Vers
-0.60
ston
-0.59
POSITIVE LOGITS
spans
1.14
span
1.12
attention
0.92
gaze
0.82
orial
0.80
scrutiny
0.77
grabbing
0.77
grab
0.77
Attention
0.76
seekers
0.74
Activations Density 0.035%