INDEX
Explanations
interrogative sentences ending in 'it' with an emphasis on high activation values
rhetorical questions and conversational phrases
New Auto-Interp
Negative Logits
furt
-0.79
umbn
-0.66
eric
-0.64
izont
-0.63
aeper
-0.60
orsi
-0.60
Stand
-0.60
ternity
-0.59
esm
-0.57
aneers
-0.57
POSITIVE LOGITS
?!
0.95
?
0.88
!?
0.82
??
0.81
?!"
0.79
?
0.76
?'
0.74
adorable
0.73
!?"
0.73
?"
0.72
Activations Density 0.042%