INDEX
    Explanations

    words related to rules, regulations, consequences, and penalties

    concepts related to societal rules, norms, and consequences

    New Auto-Interp
    Negative Logits
    iven
    -0.67
    ãĥij
    -0.65
    ãĥķãĤ©
    -0.63
    ophon
    -0.62
    ãĤ»
    -0.61
    ando
    -0.60
     Siber
    -0.59
    ãĥĭ
    -0.59
    SPONSORED
    -0.59
    `.
    -0.59
    POSITIVE LOGITS
     deserve
    0.98
     tended
    0.96
     will
    0.91
     must
    0.89
     are
    0.87
     cannot
    0.86
     may
    0.86
     ought
    0.85
     knows
    0.84
     tend
    0.84
    Act Density 0.676%

    No Known Activations