INDEX
    Explanations

    instances of moral judgment and societal norms

    New Auto-Interp
    Negative Logits
    zier
    -0.14
    ÑĸйÑģ
    -0.13
     indeb
    -0.13
    ../../../
    -0.13
    afort
    -0.13
    .Serve
    -0.13
     OnTriggerEnter
    -0.13
     trÆ°á»Łng
    -0.12
    vester
    -0.12
     fours
    -0.12
    POSITIVE LOGITS
     second
    1.39
    second
    1.20
    Second
    1.04
    第äºĮ
    1.03
     Second
    1.02
     SECOND
    1.00
    -second
    0.99
     Secondly
    0.98
    .second
    0.95
     第äºĮ
    0.95
    Act Density 0.513%

    No Known Activations