INDEX
    Explanations

    phrases related to backing down or retracting statements

    phrases related to refusal or persistence in backing down

    New Auto-Interp
    Negative Logits
    anon
    -0.83
    arth
    -0.77
    nesota
    -0.74
    anan
    -0.73
    oven
    -0.70
    lav
    -0.68
    marks
    -0.67
    oran
    -0.67
    teenth
    -0.66
    ross
    -0.65
    POSITIVE LOGITS
     blindly
    0.80
     hesitate
    0.79
     apologise
    0.75
     hesitation
    0.74
     apology
    0.73
     forcefully
    0.73
     uncond
    0.73
     sooner
    0.72
    antic
    0.72
     vigorously
    0.72
    Act Density 0.159%

    No Known Activations