INDEX
    Explanations

    mentions of prisons or related terms like prison sentences

    references to prisons and the prison system

    New Auto-Interp
    Negative Logits
    yip
    -0.95
    lass
    -0.79
    ï¸ı
    -0.79
    thora
    -0.77
    udden
    -0.74
    omatic
    -0.71
    rians
    -0.70
    rian
    -0.69
    ::::::::
    -0.68
    witz
    -0.67
    POSITIVE LOGITS
     inmates
    0.93
     prisons
    0.86
     inmate
    0.84
     prison
    0.82
    prison
    0.80
     barr
    0.79
     sentences
    0.78
     gul
    0.78
     confinement
    0.76
     camps
    0.75
    Act Density 0.026%

    No Known Activations