INDEX
    Explanations

    references to being imprisoned or held captive

    references to prisoners and their experiences

    New Auto-Interp
    Negative Logits
    orp
    -0.79
     Boll
    -0.75
    OPA
    -0.70
    amera
    -0.68
    wig
    -0.66
    orie
    -0.66
    drive
    -0.64
    alore
    -0.64
    ulously
    -0.64
    ories
    -0.63
    POSITIVE LOGITS
     prisoners
    1.01
     prisoner
    0.90
     captives
    0.88
     inmates
    0.87
     sentenced
    0.83
     detainees
    0.79
     incarcerated
    0.78
     confinement
    0.77
    icts
    0.73
     captive
    0.73
    Act Density 0.025%

    No Known Activations