INDEX
Explanations
phrases related to political figures expressing opinions or making statements
statements expressing desires or intentions regarding actions or policies
New Auto-Interp
Negative Logits
)--
-0.82
)?
-0.74
').
-0.73
.--
-0.68
}.
-0.68
?).
-0.66
?),
-0.64
)—
-0.62
.'
-0.61
.'"
-0.61
POSITIVE LOGITS
"â̦
0.70
underestimated
0.63
"[
0.61
Blumenthal
0.61
"
0.60
"#
0.59
"@
0.58
"'
0.56
consultations
0.56
misinterpret
0.56
Activations Density 1.575%