INDEX
    Explanations

    mentions of legal and criminal activities or events

    phrases and terms related to legal penalties and incarceration

    New Auto-Interp
    Negative Logits
    his
    -0.61
    wered
    -0.60
    "))
    -0.57
     proble
    -0.57
     HIS
    -0.52
     Its
    -0.52
    atcher
    -0.51
     its
    -0.51
    alon
    -0.51
     his
    -0.50
    POSITIVE LOGITS
     respectively
    2.14
     apiece
    1.63
     together
    1.60
    together
    1.32
     jointly
    1.30
     collectively
    1.28
     themselves
    1.23
     respective
    1.22
    selves
    1.20
     Together
    1.17
    Act Density 0.930%

    No Known Activations