INDEX
    Explanations

    targets related to specific goals or objectives

    phrases describing objectives or goals

    New Auto-Interp
    Negative Logits
    Guard
    -0.78
    note
    -0.74
    minus
    -0.73
    acted
    -0.72
    guards
    -0.70
    shit
    -0.64
    outside
    -0.64
    chapter
    -0.63
    hot
    -0.62
    part
    -0.62
    POSITIVE LOGITS
     maximizing
    0.91
     perfection
    0.90
     maximize
    0.88
     achieving
    0.84
     achieve
    0.84
     emulate
    0.84
     replicate
    0.82
     minimize
    0.79
     improving
    0.79
     recreate
    0.77
    Act Density 0.164%

    No Known Activations