INDEX
    Explanations

    phrases related to assigning blame or fault

    phrases that assign blame or indicate fault

    New Auto-Interp
    Negative Logits
    女
    -0.79
    ago
    -0.76
    aii
    -0.75
    ju
    -0.72
    sov
    -0.72
    olition
    -0.72
    uce
    -0.70
    endum
    -0.69
    onson
    -0.68
    rooms
    -0.68
    POSITIVE LOGITS
    lessly
    0.96
    less
    0.81
     forgiven
    0.77
    lessness
    0.76
     Fault
    0.74
     fault
    0.72
     Logged
    0.71
     piety
    0.70
    ously
    0.69
     faults
    0.67
    Act Density 0.012%

    No Known Activations