INDEX
Explanations
questions starting with what
questions (especially assistant-initiated interrogative phrases like "What..." or "Anything...") that start a turn requesting input or offering help.
New Auto-Interp
Negative Logits
শুধুমাত্র
0.19
או
0.18
などに
0.17
केवल
0.17
maupun
0.17
yalnızca
0.17
हालांकि
0.17
लंबे
0.17
jedynie
0.17
หรือไม่
0.17
POSITIVE LOGITS
do
0.27
?
0.26
motivates
0.23
would
0.22
exactly
0.22
did
0.22
?
0.22
?!
0.21
Exactly
0.21
else
0.20
Activations Density 0.609%