INDEX
Explanations
instances of manipulation and abuse in various contexts
abuse and exploitation
New Auto-Interp
Negative Logits
Cuthbert
-0.45
Contro
-0.43
Origin
-0.42
prosp
-0.42
ніципалі
-0.41
Comb
-0.41
genicity
-0.41
Origin
-0.40
Cont
-0.40
SceneManagement
-0.40
POSITIVE LOGITS
autorytatywna
0.64
verwijspagina
0.64
misuse
0.59
abuse
0.54
Utilizamos
0.53
exploited
0.53
abused
0.52
Anfitrión
0.52
misused
0.52
pinulongan
0.51
Activations Density 0.031%