INDEX
    Explanations

    phrases indicating hesitation, curiosity, or introspection

    phrases expressing uncertainty or difficulty in actions

    New Auto-Interp
    Negative Logits
    illary
    -0.83
    Offline
    -0.76
    ver
    -0.68
    Delivery
    -0.68
    ftime
    -0.67
    ories
    -0.65
    ilater
    -0.62
    liest
    -0.62
    ificial
    -0.61
     Dru
    -0.61
    POSITIVE LOGITS
     grin
    1.06
     laugh
    1.01
     chuckle
    0.99
     wonder
    0.99
     feel
    0.98
     notice
    0.96
     smile
    0.94
     feeling
    0.88
     impressed
    0.87
     noticing
    0.86
    Act Density 0.070%

    No Known Activations