INDEX
Explanations
phrases related to warnings and precautions
phrases indicating a lack of pretense or straightforward communication
New Auto-Interp
Negative Logits
etheless
-0.67
alties
-0.66
nels
-0.64
arov
-0.62
lining
-0.62
Beir
-0.61
nell
-0.59
ussia
-0.59
netflix
-0.58
leton
-0.57
POSITIVE LOGITS
disclaimer
0.92
recap
0.91
summar
0.90
summarize
0.88
briefly
0.83
reiterate
0.83
paraph
0.80
understatement
0.79
bre
0.78
rant
0.78
Activations Density 0.623%