INDEX
Explanations
comparative phrases indicative of evaluation or contrast
New Auto-Interp
Negative Logits
quet
-0.20
sil
-0.16
ayd
-0.15
-0.14
rello
-0.14
ilio
-0.14
Robbins
-0.13
acho
-0.13
polarity
-0.13
.lng
-0.13
POSITIVE LOGITS
_HOLD
0.15
IFI
0.14
дина
0.14
едÑĮ
0.14
ENA
0.14
دÛĮگر
0.14
idUser
0.14
imple
0.14
svp
0.14
ëĮĢë¡ľ
0.13
Activations Density 0.054%