INDEX
Explanations
percentage increases or decreases
phrases indicating measurements or statistics
New Auto-Interp
Negative Logits
Emails
-0.73
airs
-0.69
Said
-0.66
NetMessage
-0.65
letters
-0.64
eston
-0.63
ocom
-0.62
items
-0.62
udic
-0.61
went
-0.61
POSITIVE LOGITS
ratio
1.12
fraction
1.11
figure
1.05
feat
1.04
underestimate
0.99
statistic
0.98
decrease
0.97
percentage
0.96
margin
0.96
threshold
0.95
Activations Density 0.123%