INDEX
    Explanations

    concepts related to ethical decision-making and moral reasoning

    New Auto-Interp
    Negative Logits
    akis
    -0.15
    oods
    -0.15
    lug
    -0.15
     Pot
    -0.15
     reap
    -0.14
    ieri
    -0.14
    ugas
    -0.14
    alama
    -0.14
    stdClass
    -0.13
     McMahon
    -0.13
    POSITIVE LOGITS
     course
    0.50
     Course
    0.41
    course
    0.40
    Course
    0.39
    -course
    0.36
     route
    0.34
     courses
    0.34
     choice
    0.32
    _course
    0.32
     move
    0.31
    Act Density 0.102%

    No Known Activations