INDEX
Explanations
phrases that indicate contradiction or contrast in statements
followed by a negation
certainly did, way not
New Auto-Interp
Negative Logits
úteis
-0.46
~
-0.45
disambiguazione
-0.45
~
-0.45
MessageWindow
-0.44
EconPapers
-0.43
plati
-0.43
del
-0.43
dup
-0.43
<bos>
-0.42
POSITIVE LOGITS
never
0.86
nunca
0.83
never
0.76
cannot
0.76
Never
0.76
出版年
0.75
Personendaten
0.75
ScopeManager
0.75
nunca
0.74
Never
0.74
Activations Density 0.481%