INDEX
    Explanations

    phrases related to conditions or consequences of certain actions or beliefs

    instances of the word "don't" or its variations

    New Auto-Interp
    Negative Logits
    Site
    -0.71
     Alleg
    -0.68
     Balanced
    -0.67
    OSP
    -0.67
     Policies
    -0.64
     Starts
    -0.61
    Gall
    -0.61
     Strategy
    -0.60
     Palest
    -0.60
    inia
    -0.59
    POSITIVE LOGITS
     bother
    0.94
     succeed
    0.94
    theless
    0.85
    urtles
    0.84
     comply
    0.82
     appreciate
    0.81
     recognize
    0.81
     realize
    0.80
     necessarily
    0.79
     realise
    0.79
    Act Density 0.083%

    No Known Activations