INDEX
Explanations
phrases indicating intention or actions related to deception or secrecy
New Auto-Interp
Negative Logits
darm
-0.16
або
-0.16
herits
-0.15
.scalablytyped
-0.15
zas
-0.15
aven
-0.15
TextAlign
-0.14
zyst
-0.14
ernity
-0.14
lfw
-0.14
POSITIVE LOGITS
kon
0.16
ascii
0.15
ows
0.15
ascii
0.14
низ
0.14
Virus
0.14
raquo
0.14
crossorigin
0.14
ddy
0.14
uya
0.13
Activations Density 0.126%