INDEX
Explanations
mentions of effort and labor-related concepts
New Auto-Interp
Negative Logits
ught
-0.20
idenav
-0.16
inne
-0.16
unya
-0.14
/by
-0.14
lendir
-0.14
ELLOW
-0.14
æĺŃ
-0.13
avian
-0.13
755
-0.13
POSITIVE LOGITS
worked
0.24
-working
0.23
worked
0.21
working
0.21
working
0.20
Working
0.20
Working
0.20
out
0.20
toward
0.19
towards
0.18
Activations Density 0.045%