INDEX
Explanations
prepositions followed by the
New Auto-Interp
Negative Logits
<unused338>
0.28
行き
0.28
labile
0.28
områ
0.27
espère
0.27
はある
0.27
0
0.27
perverse
0.26
𐰚
0.26
outset
0.26
POSITIVE LOGITS
the
0.82
The
0.52
the
0.47
teh
0.44
our
0.44
their
0.41
那个
0.40
a
0.40
The
0.40
an
0.38
Activations Density 0.275%