INDEX
Explanations
words related to attention or emphasis
New Auto-Interp
Negative Logits
ania
-0.80
asca
-0.75
idden
-0.70
named
-0.68
OUGH
-0.67
ston
-0.65
mia
-0.62
ccess
-0.62
paran
-0.62
ibrary
-0.61
POSITIVE LOGITS
starter
0.88
rite
0.86
Goal
0.82
squarely
0.82
focuses
0.80
ivation
0.80
focus
0.79
fulness
0.78
focused
0.78
toward
0.74
Activations Density 0.025%