INDEX
    Explanations

    discussions about moral and ethical dilemmas related to personal beliefs and practices

    New Auto-Interp
    Negative Logits
     raiſ
    -0.81
     itſelf
    -0.78
     ſever
    -0.78
     ſta
    -0.78
     purpoſe
    -0.77
     iſt
    -0.76
     pleaſure
    -0.76
     myſelf
    -0.75
     deſt
    -0.75
     houſe
    -0.74
    POSITIVE LOGITS
     mr
    0.65
     q
    0.64
     south
    0.61
     k
    0.59
     north
    0.59
     ko
    0.54
     zu
    0.53
     m
    0.53
     y
    0.53
     j
    0.52
    Act Density 2.194%

    No Known Activations