INDEX
Explanations
but followed by pronoun or article
New Auto-Interp
Negative Logits
،
0.27
,
0.26
、
0.22
၊
0.22
в
0.21
፣
0.21
$,
0.20
sabbam
0.20
ো
0.20
0.20
POSITIVE LOGITS
it
0.27
in
0.25
at
0.23
certainly
0.22
Y
0.21
don
0.21
N
0.21
often
0.20
Imagine
0.20
V
0.20
Activations Density 0.434%