INDEX
    Explanations

    phrases related to making choices or trade-offs

    terms related to trade-offs and evaluations of value

    New Auto-Interp
    Negative Logits
    bered
    -0.74
    urses
    -0.73
    arus
    -0.73
    late
    -0.71
    liter
    -0.69
    miah
    -0.68
    bill
    -0.68
    attery
    -0.65
    bus
    -0.65
    estic
    -0.65
    POSITIVE LOGITS
     downside
    0.92
     why
    0.86
     Problem
    0.83
    why
    0.83
     WHY
    0.79
     weaknesses
    0.78
     lesson
    0.78
    Problem
    0.76
     drawback
    0.76
     takeaway
    0.75
    Act Density 0.531%

    No Known Activations