INDEX
Explanations
phrases related to actions and observations
affirmative statements and positive actions
New Auto-Interp
Negative Logits
Spons
-0.82
sponsoring
-0.77
ibrary
-0.72
Sources
-0.68
Optional
-0.68
href
-0.67
Sponsor
-0.66
soDeliveryDate
-0.66
necessity
-0.66
Mandatory
-0.65
POSITIVE LOGITS
behaved
1.38
deterior
1.31
behaves
1.29
behave
1.25
noticeably
1.23
deteriorated
1.21
fared
1.09
outper
1.08
improved
1.04
seem
1.01
Activations Density 0.391%