INDEX
    Explanations

    the word "pull" with strong activations

    New Auto-Interp
    Negative Logits
    merce
    -0.74
    ibel
    -0.74
    voy
    -0.73
    icol
    -0.73
    esa
    -0.72
    orld
    -0.70
    mint
    -0.69
    heid
    -0.69
    SPONSORED
    -0.67
     Scotia
    -0.66
    POSITIVE LOGITS
     levers
    0.99
     aggro
    0.88
     weeds
    0.85
     punches
    0.83
    pull
    0.82
     away
    0.82
    awed
    0.80
     awa
    0.79
     off
    0.78
    chairs
    0.78
    Act Density 0.032%

    No Known Activations