INDEX
Explanations
phrases indicating personal opinions or subjective expressions
New Auto-Interp
Negative Logits
[B
-0.14
cape
-0.14
rios
-0.14
åĵ
-0.13
yles
-0.13
ró
-0.13
Ñħи
-0.13
_CONST
-0.13
sha
-0.13
æı
-0.13
POSITIVE LOGITS
other
0.17
_SCR
0.15
whether
0.15
others
0.14
enser
0.14
coni
0.14
nữa
0.14
samot
0.14
intl
0.14
دÛĮگر
0.14
Activations Density 0.017%