INDEX
Explanations
phrases indicating desire or preference
expressions questioning desires or preferences
New Auto-Interp
Negative Logits
bis
-0.61
Letter
-0.59
notations
-0.59
Oaks
-0.56
pc
-0.56
rams
-0.55
Runner
-0.55
Fargo
-0.55
variants
-0.54
requested
-0.54
POSITIVE LOGITS
?)
1.18
?),
1.16
?!
1.14
?).
1.10
?!"
1.09
!?"
1.07
?"
1.01
?
1.00
!?
0.99
?'"
0.92
Activations Density 0.103%