INDEX
Explanations
phrases questioning or emphasizing a point
phrases that prompt agreement or affirmation
New Auto-Interp
Negative Logits
aples
-0.63
irlf
-0.63
abroad
-0.62
icipated
-0.60
Cosponsors
-0.59
etheus
-0.59
ufact
-0.59
ãĢij
-0.57
ternity
-0.54
theless
-0.54
POSITIVE LOGITS
?
1.24
?!
1.20
?).
1.12
!?
1.12
?"
1.11
?)
1.08
??
1.07
?),
1.04
?!"
1.03
.?
1.02
Activations Density 0.114%