INDEX
Explanations
phrases that express belief, opinion, or confirmation
New Auto-Interp
Negative Logits
ï¼ĮæĪĸ
-0.24
либо
-0.22
either
-0.21
alternatively
-0.21
Either
-0.20
ï¼Į以åıĬ
-0.19
Either
-0.19
EITHER
-0.18
maybe
-0.18
either
-0.18
POSITIVE LOGITS
or
0.21
ãĤĦ
0.19
ìĿ´ëĤĺ
0.17
oes
0.16
į
0.15
tram
0.15
o
0.15
890
0.15
oversh
0.14
aret
0.14
Activations Density 0.226%