INDEX
    Explanations

    phrases related to legal actions or consequences

    phrases indicating punishment or consequences related to actions

    New Auto-Interp
    Negative Logits
    obyl
    -0.74
    venants
    -0.72
    ires
    -0.68
    士
    -0.68
    %%
    -0.66
    DragonMagazine
    -0.66
    atl
    -0.65
    eda
    -0.63
    ocity
    -0.62
    atar
    -0.62
    POSITIVE LOGITS
     refusing
    1.33
     violating
    1.30
     daring
    1.23
     failing
    1.20
     breaching
    1.17
     exercising
    1.12
     interfering
    1.11
    gery
    1.11
     possessing
    1.09
     criticizing
    1.09
    Act Density 0.133%

    No Known Activations