INDEX
    Explanations

    political and societal manipulation and control-related phrases

    New Auto-Interp
    Negative Logits
    ITNESS
    -0.75
    ifted
    -0.58
    VIDIA
    -0.57
     Canaver
    -0.56
    uana
    -0.55
    utz
    -0.55
     confir
    -0.54
     largeDownload
    -0.51
    omorphic
    -0.50
    ertodd
    -0.50
    POSITIVE LOGITS
     itch
    0.67
     onto
    0.65
    ebted
    0.64
     into
    0.63
    into
    0.63
     prematurely
    0.62
    til
    0.60
     alike
    0.57
     goodbye
    0.57
    onto
    0.57
    Act Density 0.895%

    No Known Activations