INDEX
Explanations
references to doors and their functionality
New Auto-Interp
Negative Logits
"")
-0.79
⪢
-0.78
--
-0.75
anyahu
-0.74
mijne
-0.73
)),
-0.73
']],
-0.73
")),
-0.73
Palmas
-0.73
izability
-0.72
POSITIVE LOGITS
door
2.24
doors
2.18
Door
2.09
door
2.07
Doors
1.99
DOOR
1.99
Door
1.97
Doors
1.75
doors
1.75
DOOR
1.60
Activations Density 0.039%