INDEX
Explanations
the word "you" when the model is directly addressing the user.
offering more help or explanation
New Auto-Interp
Negative Logits
Be
0.97
Be
0.95
sollten
0.88
ক্রমবর্ধমান
0.86
sollte
0.85
在了
0.85
puissent
0.84
fhould
0.83
geprü
0.81
će
0.80
POSITIVE LOGITS
want
2.06
have
1.49
need
1.47
wanna
1.44
know
1.40
WANT
1.33
Want
1.31
want
1.31
think
1.29
feel
1.18
Activations Density 0.141%