INDEX
    Explanations

    statements expressing moral or ethical dilemmas in combat scenarios

    New Auto-Interp
    Negative Logits
     sorts
    -0.21
     sort
    -0.19
    LOTS
    -0.16
    SORT
    -0.16
    sort
    -0.16
     kinds
    -0.15
    ujet
    -0.15
    ÄŁinden
    -0.14
     lots
    -0.14
    Trivia
    -0.14
    POSITIVE LOGITS
     bulls
    0.16
     f
    0.16
    isset
    0.15
    none
    0.15
     -,
    0.15
     ain
    0.15
     y
    0.14
     none
    0.14
     me
    0.14
    uhan
    0.14
    Act Density 0.058%

    No Known Activations