INDEX
Explanations
phrases related to directing attention towards a specific topic or action
phrases related to prioritizing attention or resources
New Auto-Interp
Negative Logits
named
-0.74
OUGH
-0.74
BIT
-0.72
adding
-0.71
mia
-0.71
ania
-0.68
added
-0.67
idden
-0.63
mx
-0.63
anne
-0.62
POSITIVE LOGITS
rite
0.95
focus
0.86
squarely
0.82
peed
0.79
attention
0.79
solely
0.77
toward
0.77
focused
0.76
foc
0.75
Attention
0.74
Activations Density 0.031%