INDEX
Explanations
modified version, sentence, molecule, activity
New Auto-Interp
Negative Logits
8
1.59
are
1.41
is
1.39
ला
1.23
ש
1.20
7
1.13
ana
1.09
OS
1.07
μ
1.05
۸
1.01
POSITIVE LOGITS
ו
2.05
an
1.60
و
1.60
م
1.41
b
1.38
し
1.36
ર
1.36
ل
1.35
u
1.34
ul
1.32
Activations Density 0.013%