INDEX
Explanations
mathematical notation and code symbols
New Auto-Interp
Negative Logits
at
0.46
popsicle
0.44
Literal
0.44
。)
0.43
kopi
0.41
一块
0.41
🤗
0.40
splurge
0.39
والص
0.39
私
0.39
POSITIVE LOGITS
ת
0.86
us
0.76
و
0.74
u
0.69
ה
0.59
ו
0.59
ي
0.57
ing
0.57
ა
0.57
т
0.54
Activations Density 0.035%