INDEX
Explanations
phrases related to work and labor conditions
phrases related to adverse effects and medical issues
New Auto-Interp
Negative Logits
Truth
-0.89
erest
-0.89
nesday
-0.88
ðŁĻĤ
-0.88
ruary
-0.87
NAS
-0.83
soType
-0.82
fuck
-0.82
ultimate
-0.82
ðŁĺ
-0.81
POSITIVE LOGITS
dozens
1.07
varying
1.07
makeshift
1.06
rudimentary
1.05
roadside
1.05
sophisticated
1.00
myriad
0.99
specialized
0.97
frequent
0.97
elaborate
0.97
Activations Density 0.732%