INDEX
    Explanations

    references to prisons and correctional systems

    New Auto-Interp
    Negative Logits
    ุà¹Ī
    -0.15
    ëł¹
    -0.14
    _sampler
    -0.14
    aar
    -0.14
    _Device
    -0.13
    utr
    -0.13
    аÑĤи
    -0.13
    éric
    -0.13
     sublic
    -0.13
    æ»ħ
    -0.13
    POSITIVE LOGITS
     prison
    0.76
     Prison
    0.68
     prisoner
    0.65
     prisoners
    0.63
     jail
    0.61
     inmate
    0.61
     prisons
    0.59
     inmates
    0.57
     Jail
    0.53
     Correction
    0.50
    Act Density 0.317%

    No Known Activations