INDEX
    Explanations

    phrases related to societal or governmental issues

    expressions of personal opinions or statements of belief

    New Auto-Interp
    Negative Logits
    .</
    -0.71
    ?).
    -0.68
    .*
    -0.68
    .).
    -0.62
    ayn
    -0.60
    ãĢĤ
    -0.60
    arist
    -0.59
    +.
    -0.59
    .<
    -0.57
    aired
    -0.56
    POSITIVE LOGITS
     [
    0.99
    ,"
    0.92
    ,'"
    0.79
    %"
    0.78
    initely
    0.74
    ,''
    0.73
    ),"
    0.68
     incent
    0.67
     anecd
    0.66
    .,"
    0.65
    Act Density 0.843%

    No Known Activations