INDEX
Explanations
phrases that indicate percentages or statistics related to activities or events
New Auto-Interp
Negative Logits
advoc
-0.72
incorpor
-0.72
raq
-0.70
offending
-0.69
enlarg
-0.68
appro
-0.67
indu
-0.67
incre
-0.66
independ
-0.65
ppe
-0.65
POSITIVE LOGITS
Latest
0.84
Thousands
0.83
Authorities
0.82
Officials
0.81
Protesters
0.79
Despite
0.79
Recap
0.74
Imagine
0.74
Saying
0.74
[[
0.74
Activations Density 0.010%