INDEX
    Explanations

    phrases related to expressing personal viewpoints or beliefs

    mentions of personal opinions

    New Auto-Interp
    Negative Logits
     Gutenberg
    -0.78
    gm
    -0.68
     Roose
    -0.67
    ammers
    -0.66
    artifacts
    -0.66
    ammy
    -0.65
     Mamm
    -0.64
    artney
    -0.64
    estones
    -0.63
     Danger
    -0.63
    POSITIVE LOGITS
     opinion
    0.90
     opinions
    0.87
     largeDownload
    0.82
     piece
    0.76
     polls
    0.75
     opin
    0.74
    atively
    0.73
     bias
    0.72
    eering
    0.72
     paralysis
    0.70
    Act Density 0.020%

    No Known Activations