INDEX
Explanations
phrases indicating negation or contradiction
negations or expressions of denial
New Auto-Interp
Negative Logits
psey
-0.69
often
-0.64
purportedly
-0.62
progressively
-0.61
ĻĤ
-0.60
supposedly
-0.58
allegedly
-0.57
usually
-0.57
dunno
-0.57
Advertisement
-0.56
POSITIVE LOGITS
rist
0.73
onen
0.72
osher
0.67
å¤
0.66
sson
0.64
ãĤ¤ãĥĪ
0.64
olics
0.63
ãĤª
0.63
Monteneg
0.62
von
0.62
Activations Density 0.264%