INDEX
Explanations
attributes or actions in a scenario
important technical specifications or conditions
New Auto-Interp
Negative Logits
promotion
-0.84
Opportun
-0.77
itism
-0.74
promotions
-0.73
persuasion
-0.73
href
-0.71
citations
-0.71
Promotion
-0.70
ivals
-0.69
caut
-0.69
POSITIVE LOGITS
upright
1.00
equipped
0.95
sized
0.93
submerged
0.93
stationary
0.91
chassis
0.91
mechanically
0.90
housed
0.90
fitted
0.90
undergoing
0.88
Activations Density 0.651%