INDEX
Explanations
question marks signaling uncertainty or inquiries
questions with a focus on penalties or rules
New Auto-Interp
Negative Logits
encount
-0.80
marsh
-0.76
apan
-0.70
ank
-0.69
alist
-0.68
neau
-0.68
ENTS
-0.67
arrang
-0.64
harness
-0.64
corrid
-0.64
POSITIVE LOGITS
Nope
1.24
Nah
1.01
Huh
0.98
.?
0.97
Probably
0.97
Absolutely
0.93
Yep
0.93
Yes
0.89
Possibly
0.89
Didn
0.89
Activations Density 0.102%