INDEX
    Explanations

    statements related to responsibility and community impact

    New Auto-Interp
    Negative Logits
    ՚
    -0.75
     Roskov
    -0.73
    ?
    
    -0.72
    ".
    
    -0.71
    Tikang
    -0.70
    ...
    
    -0.70
    -0.69
    ########.
    -0.69
     snippetHide
    -0.68
    !
    
    -0.68
    POSITIVE LOGITS
     Although
    0.84
     While
    0.79
     |
    0.78
     The
    0.78
     This
    0.77
     Despite
    0.76
     Since
    0.76
     Moreover
    0.76
     However
    0.74
     These
    0.74
    Act Density 0.124%

    No Known Activations