INDEX
    Explanations

    mentions of being imprisoned or serving time behind bars

    mentions of "bars," relating to confinement or places where people gather

    New Auto-Interp
    Negative Logits
    ctive
    -0.91
    lihood
    -0.75
    IBLE
    -0.74
    sie
    -0.73
    EngineDebug
    -0.71
    UAL
    -0.70
    ALLY
    -0.69
    åĬ
    -0.68
    ULAR
    -0.67
     GENERAL
    -0.66
    POSITIVE LOGITS
    hops
    1.07
     bars
    1.04
    poon
    1.03
    hop
    1.02
    manship
    0.97
    becue
    0.95
    itone
    0.90
    mith
    0.89
    bell
    0.86
    bars
    0.86
    Act Density 0.005%

    No Known Activations