INDEX
    Explanations

    words related to physical structures, specifically walls

    references to physical barriers or structural components

    New Auto-Interp
    Negative Logits
    Published
    -0.72
    Gene
    -0.71
    avez
    -0.69
    heny
    -0.68
    ya
    -0.66
    Wild
    -0.65
    phrine
    -0.65
    EVA
    -0.65
    Pacific
    -0.65
     PubMed
    -0.64
    POSITIVE LOGITS
     walls
    1.31
    papers
    1.00
     Walls
    0.97
     wall
    0.94
    aby
    0.90
    abies
    0.88
     ceilings
    0.78
    paper
    0.78
     paintings
    0.77
    eries
    0.76
    Act Density 0.011%

    No Known Activations