INDEX
    Explanations

    discussions about hypocrisy and moral inconsistencies in behavior and beliefs

    New Auto-Interp
    Negative Logits
    lob
    -0.16
     Mechan
    -0.16
    enef
    -0.14
    ãĥ¼ãĥĵ
    -0.14
    wit
    -0.14
    rost
    -0.14
    toolbox
    -0.14
     mercy
    -0.14
    liž
    -0.13
    eka
    -0.13
    POSITIVE LOGITS
     behavior
    0.42
     conduct
    0.40
    è¡Į为
    0.38
     actions
    0.38
     behaviour
    0.36
     behaviors
    0.36
     Behavior
    0.35
    behavior
    0.34
    Behavior
    0.31
     повед
    0.31
    Act Density 0.301%

    No Known Activations