INDEX
Explanations
years followed by punctuation
New Auto-Interp
Negative Logits
February
0.34
فبراير
0.32
Hopefully
0.32
ragazzo
0.30
migliore
0.30
FEBRUARY
0.29
ফেব্রুয়ার
0.29
melhor
0.28
গত
0.28
২০২৩
0.28
POSITIVE LOGITS
silam
0.52
yılında
0.37
년에
0.37
년
0.37
году
0.36
年
0.35
년부터
0.34
ish
0.34
年的
0.33
年には
0.33
Activations Density 0.028%