INDEX
    Explanations

    words related to physical barriers or obstacles, especially fences

    New Auto-Interp
    Negative Logits
    nant
    -0.72
     forth
    -0.67
    occ
    -0.64
    practice
    -0.63
    esta
    -0.63
    alg
    -0.62
     Nir
    -0.62
    ounces
    -0.62
    arin
    -0.60
    )=(
    -0.60
    POSITIVE LOGITS
     fence
    1.50
     fences
    1.24
     fencing
    1.08
    -+-+
    0.82
     encl
    0.81
     este
    0.80
    vine
    0.78
    yard
    0.76
     perimeter
    0.75
     gate
    0.74
    Act Density 0.007%

    No Known Activations