INDEX
Explanations
phrases related to paying attention or the concept of attentiveness
New Auto-Interp
Negative Logits
Efq
-1.08
Monfieur
-1.00
Theſe
-1.00
itſelf
-1.00
myſelf
-1.00
raiſ
-0.99
Jefus
-0.95
ſelf
-0.94
againſt
-0.90
himſelf
-0.90
POSITIVE LOGITS
attention
0.97
ced
0.89
han
0.79
attention
0.75
han
0.67
Attention
0.67
HAN
0.66
atten
0.61
att
0.59
attentive
0.59
Activations Density 0.088%