INDEX
Explanations
HO, CO, or specific foreign syllables
New Auto-Interp
Negative Logits
м
0.81
ம்
0.79
م
0.77
ו
0.71
я
0.64
on
0.64
н
0.63
و
0.61
ER
0.60
belieb
0.60
POSITIVE LOGITS
1
0.98
that
0.84
arı
0.84
that
0.82
you
0.81
are
0.80
る
0.80
ty
0.80
flav
0.80
of
0.79
Activations Density 0.075%