INDEX
    Explanations

    terms related to crime and punishment

    phrases related to social dynamics and interactions

    New Auto-Interp
    Negative Logits
     (?,
    -0.53
    orem
    -0.49
    .",
    -0.49
    ukong
    -0.45
    venants
    -0.44
    owered
    -0.44
    yssey
    -0.44
    bilt
    -0.43
    arij
    -0.43
    aturday
    -0.43
    POSITIVE LOGITS
    !).
    0.58
    ?).
    0.51
    ).[
    0.51
    )?
    0.50
    )—
    0.48
     phr
    0.45
    !)
    0.44
    )[
    0.44
    -)
    0.43
    ).
    0.43
    Act Density 3.321%

    No Known Activations