INDEX
    Explanations

    references to wheelchairs

    New Auto-Interp
    Negative Logits
    dül
    -0.16
    dense
    -0.16
    ittel
    -0.15
    yne
    -0.15
    tle
    -0.15
    oulos
    -0.15
    enthal
    -0.15
    ional
    -0.15
    olta
    -0.14
    lope
    -0.14
    POSITIVE LOGITS
    chair
    0.43
    wright
    0.32
    bar
    0.31
    ie
    0.30
    -chair
    0.28
     chair
    0.28
    base
    0.28
    Chair
    0.28
    house
    0.28
    ing
    0.27
    Act Density 0.012%

    No Known Activations