INDEX
    Explanations

    the word "chosen" with a high level of activation

    instances of the word "chosen."

    New Auto-Interp
    Negative Logits
    pat
    -0.76
    Net
    -0.75
    urst
    -0.71
    ptoms
    -0.70
    ilit
    -0.67
    vacc
    -0.67
    CDC
    -0.66
    monds
    -0.66
    itamin
    -0.65
    emic
    -0.65
    POSITIVE LOGITS
     chosen
    1.01
     randomly
    0.83
     chooses
    0.80
    lists
    0.80
     choosing
    0.78
     chose
    0.75
     ACTIONS
    0.75
     Disciple
    0.74
    çĶŁ
    0.73
    selection
    0.71
    Act Density 0.010%

    No Known Activations