INDEX
    Explanations

    negative sentiments and criticisms related to moral dilemmas

    New Auto-Interp
    Negative Logits
    afort
    -0.14
    zier
    -0.14
     OnTriggerEnter
    -0.13
     indeb
    -0.13
    анÑģ
    -0.13
    ../../../
    -0.13
    addir
    -0.12
    ÑĸйÑģ
    -0.12
     trÆ°á»Łng
    -0.12
    ä¸ī个
    -0.12
    POSITIVE LOGITS
     second
    1.43
    second
    1.23
    Second
    1.06
    第äºĮ
    1.05
     Second
    1.04
    -second
    1.02
     SECOND
    1.02
     Secondly
    0.99
    .second
    0.97
     第äºĮ
    0.96
    Act Density 0.490%

    No Known Activations