INDEX
Explanations
"like" or "as" in many languages
New Auto-Interp
Negative Logits
t
1.27
س
1.15
p
1.13
y
1.10
g
1.07
m
1.01
d
0.96
to
0.94
น
0.89
of
0.87
POSITIVE LOGITS
precluded
0.78
하지만
0.75
أم
0.74
ו
0.72
ಲಾ
0.71
జేపీ
0.71
albeit
0.71
ань
0.70
بر
0.70
ות
0.68
Activations Density 0.001%