INDEX
Explanations
elements that express positivity and supportiveness
New Auto-Interp
Negative Logits
indeed
-0.24
only
-0.21
atleast
-0.20
only
-0.20
именно
-0.19
quite
-0.19
both
-0.18
neither
-0.18
BOTH
-0.18
ONLY
-0.18
POSITIVE LOGITS
plain
0.23
thôi
0.23
ifiable
0.22
ifi
0.20
ifying
0.20
IFI
0.19
æĻ®éĢļ
0.18
plain
0.18
Plain
0.18
vailability
0.17
Activations Density 0.167%