INDEX
    Explanations

    words related to barriers or obstacles

    references to gates and gatekeepers, implying control or access points

    New Auto-Interp
    Negative Logits
     Hots
    -0.85
    ensional
    -0.73
    yles
    -0.70
    arus
    -0.69
    issance
    -0.68
    ortium
    -0.67
     Norm
    -0.66
    ity
    -0.65
    urrent
    -0.62
    itarian
    -0.61
    POSITIVE LOGITS
    keepers
    1.38
    keeper
    1.37
    ways
    1.21
    keeping
    1.07
    fold
    1.06
    posts
    0.98
    way
    0.96
    stones
    0.92
    hole
    0.90
    house
    0.85
    Act Density 0.032%

    No Known Activations