INDEX
Explanations
words related to job performance and responsibilities
references to responsibility and performance of tasks
New Auto-Interp
Negative Logits
glas
-0.80
urated
-0.75
lished
-0.70
ONSORED
-0.69
razil
-0.68
Flavoring
-0.66
artment
-0.64
incible
-0.63
eele
-0.62
Scythe
-0.62
POSITIVE LOGITS
same
0.93
utmost
0.93
same
0.88
justice
0.87
homework
0.84
injustice
0.81
usual
0.80
best
0.79
chores
0.73
oret
0.70
Activations Density 0.107%