INDEX
Explanations
contractions indicating negative sentiment or disbelief
phrases expressing uncertainty or conditionality
New Auto-Interp
Negative Logits
ishing
-0.69
arer
-0.64
«ĺ
-0.62
ÃŁ
-0.62
acca
-0.61
assing
-0.60
Shell
-0.59
active
-0.59
Cosponsors
-0.58
Tik
-0.58
POSITIVE LOGITS
ĸļ
0.78
enance
0.73
uce
0.70
tumble
0.69
clinton
0.68
tarians
0.67
offend
0.65
arez
0.63
sooner
0.63
yip
0.62
Activations Density 0.219%