INDEX
Explanations
references to doors and their interactions
New Auto-Interp
Negative Logits
']],
-0.87
'}),
-0.84
)),
-0.81
)"),
-0.81
')),
-0.81
")),
-0.80
"")
-0.79
liesslich
-0.78
Allociné
-0.76
eclared
-0.76
POSITIVE LOGITS
door
2.18
doors
2.13
Door
2.00
door
1.98
Doors
1.91
DOOR
1.91
Door
1.88
Doors
1.68
doors
1.66
DOOR
1.48
Activations Density 0.039%