INDEX
Explanations
phrases indicating personal opinions or statements
New Auto-Interp
Negative Logits
иÑĤив
-0.17
Zag
-0.15
à¤Ĺर
-0.15
ith
-0.15
odes
-0.15
wan
-0.15
any
-0.15
ult
-0.14
byname
-0.14
_lite
-0.14
POSITIVE LOGITS
amble
0.16
attice
0.16
OLON
0.15
Cain
0.14
rypt
0.14
สะ
0.14
ptune
0.14
dle
0.14
tiêu
0.13
obia
0.13
Activations Density 0.057%