INDEX
Explanations
references to issues of attention and distraction in society
New Auto-Interp
Negative Logits
iren
-0.15
èĥĨ
-0.14
·
-0.14
oji
-0.13
UV
-0.13
irim
-0.13
æij¸
-0.13
yth
-0.13
TECTED
-0.13
uhe
-0.13
POSITIVE LOGITS
attention
0.59
Attention
0.54
attention
0.49
Attention
0.49
attent
0.41
paying
0.39
attentive
0.37
_attention
0.36
pay
0.35
focus
0.35
Activations Density 0.035%