INDEX
    Explanations

    references to physical structures like bridges

    instances of the word "bridge."

    New Auto-Interp
    Negative Logits
    arily
    -0.92
    iaries
    -0.74
    resy
    -0.73
    matically
    -0.70
    atically
    -0.68
    eal
    -0.65
    ILY
    -0.64
    Policy
    -0.64
    arios
    -0.63
    psy
    -0.62
    POSITIVE LOGITS
    port
    1.00
    bridge
    0.95
     bridges
    0.90
     bridge
    0.87
     Strait
    0.84
    roads
    0.84
     Bridges
    0.83
    chairs
    0.81
    layer
    0.80
    ports
    0.78
    Act Density 0.026%

    No Known Activations