INDEX
Explanations
instances of the word "work" and its variations, indicating a focus on effort and productivity
New Auto-Interp
Negative Logits
Seks
-0.15
ught
-0.14
ads
-0.14
еÑĢÑĪ
-0.14
Baldwin
-0.14
#ad
-0.14
elan
-0.14
gone
-0.13
Herman
-0.13
çŁ¢
-0.13
POSITIVE LOGITS
harder
0.19
magic
0.18
magic
0.17
hardest
0.17
Magic
0.16
hard
0.16
åĿĬ
0.16
人åĵ¡
0.16
Magic
0.15
out
0.15
Activations Density 0.057%