INDEX
    Explanations

    mentions of responsible behavior or actions

    references to responsibility and responsible behavior

    New Auto-Interp
    Negative Logits
    chu
    -0.75
    frey
    -0.72
    stals
    -0.72
     mirac
    -0.69
    ammy
    -0.68
    bows
    -0.68
     tantal
    -0.68
    forts
    -0.66
    yip
    -0.65
    OUT
    -0.65
    POSITIVE LOGITS
     behaviour
    0.99
     behavior
    0.97
     citizen
    0.89
     entreprene
    0.88
     governance
    0.84
    tarian
    0.84
     stewards
    0.84
     adult
    0.81
     manner
    0.79
     conduct
    0.78
    Act Density 0.140%

    No Known Activations