INDEX
    Explanations

    instances of the word "left" and related directional terms

    New Auto-Interp
    Negative Logits
    risk
    -0.17
    llib
    -0.17
    uw
    -0.15
    -hide
    -0.14
    appa
    -0.14
    rh
    -0.14
    ibar
    -0.14
    qualified
    -0.14
    hop
    -0.14
    ewire
    -0.13
    POSITIVE LOGITS
    most
    0.19
    -hand
    0.18
    ness
    0.17
    tings
    0.16
    /right
    0.16
    bies
    0.16
    -leaning
    0.16
    sy
    0.15
    -wing
    0.15
    ISTS
    0.15
    Act Density 0.041%

    No Known Activations