INDEX
    Explanations

    references to objects such as straws and other similar physical objects

    references to "straw" and related terms

    New Auto-Interp
    Negative Logits
    ervation
    -0.80
    ccording
    -0.77
    olon
    -0.77
    ogue
    -0.74
     notor
    -0.72
    olitan
    -0.71
    itals
    -0.70
    ynt
    -0.69
    uria
    -0.69
    cial
    -0.68
    POSITIVE LOGITS
     straw
    1.20
    backs
    0.93
    pipe
    0.88
    weights
    0.87
    mere
    0.86
    weight
    0.85
    poll
    0.84
     Straw
    0.80
    bare
    0.79
    bridge
    0.79
    Act Density 0.013%

    No Known Activations