INDEX
    Explanations

    scenarios or options that involve making difficult decisions and their potential consequences

    New Auto-Interp
    Negative Logits
     increa
    -1.80
     emphat
    -1.80
     disagre
    -1.79
     accla
    -1.76
     guarante
    -1.75
     depic
    -1.73
     wherea
    -1.71
     inev
    -1.68
     encomp
    -1.68
     affor
    -1.67
    POSITIVE LOGITS
     option
    1.34
    option
    1.19
    Option
    1.17
     Option
    1.10
     options
    1.05
    OPTION
    0.99
    options
    0.94
    Options
    0.91
     choice
    0.88
    选项
    0.85
    Act Density 0.515%

    No Known Activations