INDEX
Explanations
patterns indicating concerns about safety and working conditions in a community or workplace
New Auto-Interp
Negative Logits
apos
-0.17
ptest
-0.16
ullan
-0.15
.addAction
-0.14
pNet
-0.14
ecycle
-0.14
Ìī
-0.14
Griffin
-0.13
otta
-0.13
agra
-0.13
POSITIVE LOGITS
éĿ
0.16
apart
0.16
rens
0.15
lob
0.15
Apart
0.15
handjob
0.15
Apart
0.15
.decorate
0.14
ren
0.14
Ń
0.14
Activations Density 0.084%