INDEX
Explanations
instances of conditional phrases denoting choices or alternatives
New Auto-Interp
Negative Logits
swer
-0.16
eyi
-0.14
gle
-0.14
elez
-0.14
gles
-0.14
Ľ
-0.14
å¢
-0.13
δα
-0.13
ahun
-0.13
šem
-0.13
POSITIVE LOGITS
more
0.43
MORE
0.34
less
0.33
more
0.32
æĽ´å¤ļ
0.31
More
0.29
-more
0.29
_more
0.29
.more
0.29
lebih
0.28
Activations Density 0.041%