INDEX
    Explanations

    phrases related to manipulation or deception

    phrases indicating manipulation or coercion

    New Auto-Interp
    Negative Logits
     summarizes
    -0.65
    clips
    -0.65
     traced
    -0.64
    anwhile
    -0.63
    fleet
    -0.63
     remarked
    -0.63
    lang
    -0.63
    Internet
    -0.63
    thora
    -0.63
     Rosenstein
    -0.63
    POSITIVE LOGITS
     surrender
    0.87
     submission
    0.85
     obedience
    0.83
     acquies
    0.72
     embrace
    0.72
     heel
    0.71
     favour
    0.71
     believing
    0.70
     staying
    0.70
     cooperate
    0.69
    Act Density 0.152%

    No Known Activations