INDEX
    Explanations

    mentions of a specific physical barrier being built

    references to a border wall

    New Auto-Interp
    Negative Logits
    Gene
    -0.77
    uner
    -0.75
    lishing
    -0.71
    NI
    -0.68
    é¾įåĸļ士
    -0.65
    ria
    -0.65
    ISTER
    -0.65
    CLASSIFIED
    -0.65
    ×ķ
    -0.64
    Reward
    -0.64
    POSITIVE LOGITS
    abies
    0.98
    papers
    0.97
     wall
    0.94
     crossings
    0.93
     thickness
    0.90
     separating
    0.90
     walls
    0.87
     erected
    0.85
    aby
    0.83
     wart
    0.81
    Act Density 0.018%

    No Known Activations