INDEX
Explanations
words related to concentration or attention
New Auto-Interp
Negative Logits
OUGH
-0.75
named
-0.73
adding
-0.68
BIT
-0.68
added
-0.66
oho
-0.61
mia
-0.61
mx
-0.60
Haunted
-0.60
ylon
-0.60
POSITIVE LOGITS
rite
0.96
focus
0.86
squarely
0.83
attention
0.82
solely
0.82
rals
0.77
Attention
0.76
toward
0.75
ivism
0.74
foc
0.73
Activations Density 0.019%