INDEX
Explanations
questions and exclamations ending in a question mark
rhetorical questions
New Auto-Interp
Negative Logits
aku
-0.77
background
-0.73
apter
-0.72
ishable
-0.69
encount
-0.68
enture
-0.66
neau
-0.65
alist
-0.65
ality
-0.65
canoe
-0.64
POSITIVE LOGITS
Surely
1.06
Anyway
0.97
Well
0.95
Why
0.95
?:
0.95
Where
0.94
Isn
0.94
Probably
0.94
?!
0.93
Somebody
0.91
Activations Density 0.092%