INDEX
    Explanations

    questions related to moral or ethical dilemmas

    New Auto-Interp
    Negative Logits
    displayText
    -0.66
     Dragonbound
    -0.65
    ãģ«
    -0.65
    places
    -0.62
    isphere
    -0.61
    ãģĮ
    -0.60
    stood
    -0.58
    verage
    -0.58
    perty
    -0.58
    legram
    -0.58
    POSITIVE LOGITS
    olation
    0.94
    olated
    0.94
    berra
    0.86
    olate
    0.81
     anybody
    0.72
     anyone
    0.70
    peria
    0.70
    terness
    0.69
    ocy
    0.69
    htar
    0.68
    Act Density 1.133%

    No Known Activations