INDEX
Explanations
words related to denial or rejection of serious allegations
New Auto-Interp
Negative Logits
ramer
-0.16
Danh
-0.15
svém
-0.14
ungle
-0.14
.owl
-0.14
tant
-0.13
ÑĨеп
-0.13
imson
-0.13
ffset
-0.13
ialect
-0.13
POSITIVE LOGITS
fact
0.31
manner
0.25
fact
0.23
amount
0.21
tendency
0.20
role
0.19
way
0.18
importance
0.17
impact
0.16
recent
0.16
Activations Density 0.258%