INDEX
Explanations
"i want" / "i cannot" / "i understand"
New Auto-Interp
Negative Logits
ที่มี
0.64
Lots
0.57
ఉండే
0.55
interaction
0.53
Description
0.53
interaction
0.51
Often
0.51
Interaction
0.51
vibrancy
0.50
비슷
0.49
POSITIVE LOGITS
urge
0.96
apologize
0.93
understand
0.87
sincerely
0.85
dares
0.84
applaud
0.80
presume
0.79
sympathize
0.79
will
0.78
regret
0.77
Activations Density 0.256%