INDEX
Explanations
references to doors and their mechanisms
New Auto-Interp
Negative Logits
-0.83
liesslich
-0.81
"")
-0.80
anyahu
-0.79
HasFactory
-0.78
']],
-0.78
Palmas
-0.77
--
-0.76
")),
-0.74
)]$
-0.74
POSITIVE LOGITS
doors
1.71
door
1.64
Doors
1.60
Door
1.57
door
1.55
DOOR
1.50
Door
1.45
Doors
1.42
doors
1.33
DOOR
1.18
Activations Density 0.056%