INDEX
Explanations
references to people holding positions or titles in a professional or organizational setting
New Auto-Interp
Negative Logits
paces
-0.71
pleas
-0.69
tumble
-0.69
CRIP
-0.68
warranty
-0.67
deed
-0.65
lumber
-0.62
IGHTS
-0.61
alt
-0.60
ABE
-0.60
POSITIVE LOGITS
assian
1.13
hart
0.99
entin
0.98
ayne
0.89
idon
0.88
inal
0.85
aign
0.85
ston
0.84
elia
0.84
rick
0.84
Activations Density 0.085%