INDEX
    Explanations

    the word "over" with increasingly strong activations

    New Auto-Interp
    Negative Logits
    osity
    -0.71
     Forward
    -0.64
     associates
    -0.63
     partName
    -0.62
     Deity
    -0.62
    yssey
    -0.60
    olson
    -0.60
    resy
    -0.58
    oS
    -0.57
    iosity
    -0.57
    POSITIVE LOGITS
    kill
    1.20
    blown
    1.14
    rated
    1.10
    stated
    1.07
    reaching
    1.02
    loading
    0.99
    priced
    0.97
    joy
    0.95
    drive
    0.95
    sold
    0.92
    Act Density 0.023%

    No Known Activations