INDEX
Explanations
references to human resources (HR) departments and related terms
references to human rights organizations and their activities
New Auto-Interp
Negative Logits
tin
-0.88
tsky
-0.77
meric
-0.76
cules
-0.74
pher
-0.70
artney
-0.70
arning
-0.70
bell
-0.69
Albion
-0.66
Flavoring
-0.64
POSITIVE LOGITS
istically
0.88
ueless
0.78
OUGH
0.78
isting
0.76
ifts
0.74
ues
0.71
ingly
0.70
lasses
0.69
isted
0.68
ickson
0.67
Activations Density 0.098%