INDEX
    Explanations

    references to inmates or incarceration

    New Auto-Interp
    Negative Logits
    bon
    -0.16
     esl
    -0.16
     Imper
    -0.15
    .bo
    -0.14
     central
    -0.14
    etch
    -0.14
     Hol
    -0.14
    vr
    -0.14
    ÂŃn
    -0.14
    itzer
    -0.14
    POSITIVE LOGITS
    urai
    0.17
    elli
    0.15
    -desc
    0.15
    elier
    0.14
    SAFE
    0.14
     سÙĪ
    0.14
    ån
    0.14
    uhe
    0.14
    expo
    0.14
    :invoke
    0.14
    Act Density 0.001%

    No Known Activations