INDEX
Explanations
past-tense verbs about actions
New Auto-Interp
Negative Logits
Ι
0.39
U
0.39
(
0.38
ayatan
0.38
ôm
0.38
重要的
0.37
イ
0.37
мян
0.36
Ι
0.36
HN
0.36
POSITIVE LOGITS
ت
0.71
w
0.64
i
0.63
r
0.62
ed
0.61
in
0.59
et
0.58
k
0.58
n
0.56
d
0.53
Activations Density 0.122%