INDEX
    Explanations

    phrases related to choices or options

    the presence of the end-of-text token

    New Auto-Interp
    Negative Logits
     Vaugh
    -0.68
     Jagu
    -0.67
    evidence
    -0.62
    iments
    -0.57
     Adin
    -0.57
    ATURES
    -0.55
    anism
    -0.55
    Edit
    -0.54
    achu
    -0.54
    agree
    -0.54
    POSITIVE LOGITS
     lot
    0.96
     bunch
    0.89
     couple
    0.82
     handful
    0.81
     plethora
    0.77
     huge
    0.76
    uras
    0.76
     few
    0.75
     glimpse
    0.75
     whopping
    0.74
    Act Density 0.609%

    No Known Activations