INDEX
Explanations
phrases that contain hypens and slashes
negative statements or phrases expressing disbelief or denial
New Auto-Interp
Negative Logits
watches
-0.73
boarding
-0.73
adm
-0.67
Lancaster
-0.65
trusting
-0.65
chatting
-0.65
chats
-0.61
clothed
-0.61
convers
-0.60
supervised
-0.59
POSITIVE LOGITS
tain
1.12
mean
1.05
uable
1.04
exist
1.00
t
0.96
tale
0.88
olve
0.87
ãĤ¡
0.84
tion
0.84
require
0.83
Activations Density 0.160%