INDEX
    Explanations

    references to doors and their interactions

    New Auto-Interp
    Negative Logits
    ']],
    -0.87
    '}),
    -0.84
    )),
    
    -0.81
    )"),
    -0.81
    ')),
    -0.81
    ")),
    -0.80
    "")
    -0.79
    liesslich
    -0.78
     Allociné
    -0.76
    eclared
    -0.76
    POSITIVE LOGITS
     door
    2.18
     doors
    2.13
     Door
    2.00
    door
    1.98
     Doors
    1.91
     DOOR
    1.91
    Door
    1.88
    Doors
    1.68
    doors
    1.66
    DOOR
    1.48
    Act Density 0.039%

    No Known Activations