INDEX
    Explanations

    phrases related to personal actions and opinions

    themes related to accountability and personal responsibility

    New Auto-Interp
    Negative Logits
    )",
    -0.77
    },"
    -0.73
    ")
    -0.71
    ),"
    -0.68
    ')
    -0.68
    :]
    -0.67
    ],"
    -0.63
    )"
    -0.62
    "],"
    -0.62
    ]"
    -0.62
    POSITIVE LOGITS
     anyways
    1.31
     anyway
    1.06
    âĢ
    0.98
     somew
    0.93
    âĻ
    0.92
     tho
    0.90
    .
    0.88
     anymore
    0.87
    !.
    0.83
     ðŁĺ
    0.83
    Act Density 0.808%

    No Known Activations