INDEX
Explanations
threatening or aggressive communication
New Auto-Interp
Negative Logits
NameInMap
-0.59
UrlResolution
-0.58
الدراسه
-0.58
ویکیپدیای
-0.55
pushFollow
-0.55
UnknownFieldSet
-0.52
\{\\-0.52
]};
-0.51
esez
-0.51
ngdoc
-0.48
POSITIVE LOGITS
retali
0.77
retaliation
0.69
deterrent
0.67
vengeance
0.64
revenge
0.64
aven
0.61
✨:
0.59
deterrence
0.56
hadiran
0.55
uncin
0.55
Activations Density 0.319%