INDEX
    Explanations

    references to collective action and responsibility

    New Auto-Interp
    Negative Logits
    <bos>
    -2.25
     gend
    -0.86
     adal
    -0.81
     vang
    -0.80
     gie
    -0.79
     glan
    -0.78
     frans
    -0.78
     ù
    -0.77
     puc
    -0.75
     hej
    -0.72
    POSITIVE LOGITS
     should
    1.03
     shouldn
    0.98
     soulign
    0.96
     Should
    0.93
     Shouldn
    0.93
     véhic
    0.93
     tupperware
    0.92
     ought
    0.89
    Should
    0.87
     need
    0.86
    Act Density 0.821%

    No Known Activations