INDEX
    Explanations

    references to walls or barriers in various contexts

    New Auto-Interp
    Negative Logits
     temprana
    -0.83
    })));
    -0.81
    epam
    -0.74
     noires
    -0.74
     mijne
    -0.73
     inoxydable
    -0.72
    selbe
    -0.72
     Aimee
    -0.72
     wezen
    -0.72
     argint
    -0.71
    POSITIVE LOGITS
     wall
    2.27
     WALL
    2.20
     Wall
    2.15
     walls
    2.06
    Wall
    1.98
    wall
    1.97
    WALL
    1.86
     Walls
    1.84
    walls
    1.72
    Walls
    1.69
    Act Density 0.042%

    No Known Activations