INDEX
    Explanations

    phrases related to taking decisive actions or making significant decisions

    phrases that imply extraction or removal

    New Auto-Interp
    Negative Logits
    orously
    -0.83
    orld
    -0.78
    ould
    -0.73
    esa
    -0.71
    staking
    -0.70
    shaw
    -0.68
    pton
    -0.67
    umbered
    -0.67
    etimes
    -0.65
    ingham
    -0.64
    POSITIVE LOGITS
    stretched
    0.98
     levers
    0.74
    wards
    0.72
    rage
    0.69
    doors
    0.67
    ta
    0.65
    casts
    0.63
     microphones
    0.63
     stitches
    0.62
    WARD
    0.62
    Act Density 0.028%

    No Known Activations