INDEX
    Explanations

    scenarios involving ethical dilemmas or moral questions

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.71
     Theſe
    -0.70
     myſelf
    -0.69
    Espèce
    -0.68
     Diſ
    -0.67
     purpoſe
    -0.66
     greateſt
    -0.66
     Beſ
    -0.65
     Sarm
    -0.65
     itſelf
    -0.65
    POSITIVE LOGITS
    adaptiveStyles
    0.62
     RouterModule
    0.54
    queryInterface
    0.51
     fair
    0.51
     Fair
    0.51
    oneofs
    0.48
    addContainerGap
    0.48
    ficulty
    0.47
     الحره
    0.47
    chê
    0.46
    Act Density 0.071%

    No Known Activations