INDEX
    Explanations

    references to individuals and groups, particularly those who are vulnerable or in need of support

    New Auto-Interp
    Negative Logits
    .Include
    -0.14
    ston
    -0.14
     nutshell
    -0.13
     forbidden
    -0.13
    raz
    -0.13
    Ñĥва
    -0.13
    itez
    -0.13
    ;charset
    -0.12
    аÑĢа
    -0.12
    raph
    -0.12
    POSITIVE LOGITS
     otherwise
    0.22
    otherwise
    0.19
     Otherwise
    0.18
    Otherwise
    0.17
    osh
    0.17
     might
    0.17
     OTHERWISE
    0.17
    CKER
    0.15
     preceded
    0.14
     mattered
    0.14
    Act Density 0.197%

    No Known Activations