INDEX
Explanations
phrases indicating an emphasis on certainty or emphasis on a particular point
phrases indicating negation or limitations
New Auto-Interp
Negative Logits
tickets
-0.67
erva
-0.61
Byrd
-0.61
UW
-0.61
ticket
-0.59
EStream
-0.58
Cla
-0.58
Union
-0.58
Bundes
-0.57
WM
-0.57
POSITIVE LOGITS
whatsoever
0.97
lishes
0.87
mite
0.74
endorse
0.73
isal
0.71
resembles
0.71
specified
0.71
idental
0.70
imaginable
0.69
resembling
0.68
Activations Density 0.056%