INDEX
Explanations
words or phrases indicating a comparison or contrast between different scenarios or possibilities
phrases indicating uncertainty or qualification
New Auto-Interp
Negative Logits
Reviewer
-0.66
FTWARE
-0.66
itivity
-0.64
HTTP
-0.58
DonaldTrump
-0.56
visual
-0.56
View
-0.55
Owner
-0.55
Package
-0.55
SIGN
-0.55
POSITIVE LOGITS
least
1.62
mosp
1.14
onement
1.06
times
0.96
yp
0.90
roph
0.90
ention
0.87
hens
0.85
odds
0.85
letico
0.85
Activations Density 0.094%