INDEX
Explanations
statements indicating negation or opposition
negative phrases or sentiments related to personal experiences and expectations
New Auto-Interp
Negative Logits
Alas
-0.72
éŃĶ
-0.71
éĹĺ
-0.70
ãģ®å
-0.68
duly
-0.68
TIME
-0.67
Dear
-0.67
è¦ļéĨĴ
-0.67
Must
-0.66
ãģ®ç
-0.65
POSITIVE LOGITS
necessarily
1.24
anybody
0.97
really
0.96
gonna
0.95
wanna
0.95
flashy
0.95
anymore
0.95
[
0.89
everybody
0.89
disrespect
0.85
Activations Density 0.240%