INDEX
    Explanations

    instances of manipulation and abuse in various contexts

    New Auto-Interp
    Negative Logits
     Cuthbert
    -0.45
     Contro
    -0.43
    Origin
    -0.42
     prosp
    -0.42
    ніципалі
    -0.41
    Comb
    -0.41
    genicity
    -0.41
     Origin
    -0.40
    Cont
    -0.40
    SceneManagement
    -0.40
    POSITIVE LOGITS
     autorytatywna
    0.64
    verwijspagina
    0.64
     misuse
    0.59
     abuse
    0.54
     Utilizamos
    0.53
     exploited
    0.53
     abused
    0.52
     Anfitrión
    0.52
     misused
    0.52
     pinulongan
    0.51
    Act Density 0.031%

    No Known Activations