INDEX
Explanations
phrases related to manipulation and deception
tricked into believing
New Auto-Interp
Negative Logits
the
-0.39
WebVitals
-0.38
usercontent
-0.33
dry
-0.31
Rising
-0.31
elkaar
-0.31
zeichnen
-0.31
onderhoud
-0.30
հղումներ
-0.30
native
-0.30
POSITIVE LOGITS
betweenstory
0.65
ValueStyle
0.61
surla
0.59
AndEndTag
0.59
terpaksa
0.58
Forced
0.55
tricked
0.55
Forced
0.55
coer
0.54
forced
0.52
Activations Density 0.019%