INDEX
Explanations
references to exploitation and labor systems
New Auto-Interp
Negative Logits
urre
-0.14
itore
-0.14
finity
-0.14
URRE
-0.14
ẹp
-0.14
_tac
-0.14
irsch
-0.13
ABA
-0.13
)↵↵↵↵↵↵↵↵
-0.13
yaw
-0.13
POSITIVE LOGITS
labor
0.44
labour
0.43
work
0.40
Labor
0.34
Work
0.32
Labour
0.31
manual
0.29
Work
0.28
work
0.28
åĬ³
0.28
Activations Density 0.180%