INDEX
Explanations
was followed by descriptors
New Auto-Interp
Negative Logits
ي
0.39
ون
0.38
م
0.36
로
0.36
خستان
0.33
و
0.33
ام
0.32
י
0.32
These
0.31
ে
0.31
POSITIVE LOGITS
a
0.63
was
0.48
to
0.47
I
0.46
of
0.45
it
0.44
by
0.43
o
0.42
ется
0.42
p
0.41
Activations Density 0.047%