INDEX
    Explanations

    phrases related to challenges or controversial statements

    New Auto-Interp
    Negative Logits
     Rite
    -0.82
    urgy
    -0.78
    effic
    -0.68
    VERTISEMENT
    -0.67
    ulatory
    -0.67
    ulators
    -0.67
    ulator
    -0.65
    usterity
    -0.65
    sav
    -0.65
    OTOS
    -0.64
    POSITIVE LOGITS
     defy
    1.01
     dare
    0.99
    evil
    0.92
     daring
    0.86
     boldly
    0.86
     Dare
    0.84
    ously
    0.83
     provoke
    0.81
     dared
    0.81
     presume
    0.81
    Act Density 0.023%

    No Known Activations