INDEX
    Explanations

    the word "arm" with a high level of activation

    instances of the word "arm."

    New Auto-Interp
    Negative Logits
    ween
    -0.71
     Vide
    -0.70
     Forsaken
    -0.64
    flush
    -0.62
     moot
    -0.61
     payday
    -0.60
     elig
    -0.59
    LOAD
    -0.59
     Skinner
    -0.59
     Atlantis
    -0.58
    POSITIVE LOGITS
    ageddon
    1.41
    aceutical
    1.12
    ament
    1.08
    onica
    0.98
    atures
    0.97
    ony
    0.97
    ichael
    0.93
    illary
    0.93
    aments
    0.92
    achine
    0.92
    Act Density 0.009%

    No Known Activations