INDEX
Explanations
building efficient transformers
New Auto-Interp
Negative Logits
persuaded
0.85
ጥላ
0.82
convincingly
0.80
indicated
0.79
depicted
0.79
liable
0.79
串口
0.78
bilirubin
0.78
dissipated
0.77
carboxyl
0.77
POSITIVE LOGITS
iendo
0.93
maq
0.83
saf
0.82
s
0.82
safe
0.82
sch
0.81
sz
0.81
ن
0.81
sar
0.80
savvy
0.79
Activations Density 0.002%