INDEX
    Explanations

    references to doors and their functionality

    New Auto-Interp
    Negative Logits
    "")
    -0.79
    -0.78
    --
    
    -0.75
    anyahu
    -0.74
     mijne
    -0.73
    )),
    
    -0.73
    ']],
    -0.73
    ")),
    -0.73
     Palmas
    -0.73
    izability
    -0.72
    POSITIVE LOGITS
     door
    2.24
     doors
    2.18
     Door
    2.09
    door
    2.07
     Doors
    1.99
     DOOR
    1.99
    Door
    1.97
    Doors
    1.75
    doors
    1.75
    DOOR
    1.60
    Act Density 0.039%

    No Known Activations