INDEX
Explanations
instances of the word "another."
New Auto-Interp
Negative Logits
other
-0.18
further
-0.16
åı¦
-0.16
autres
-0.16
ups
-0.16
ses
-0.15
_OTHER
-0.14
outras
-0.14
Other
-0.14
andre
-0.14
POSITIVE LOGITS
-than
0.23
equally
0.20
world
0.17
ness
0.17
¢åįķ
0.17
ovnÄĽ
0.17
ildo
0.16
layer
0.15
dozen
0.15
layer
0.15
Activations Density 0.042%