INDEX
Explanations
references to "rooms" and their characteristics or quantities
New Auto-Interp
Negative Logits
)";
-1.10
)');
-1.00
!")
-0.96
]})
-0.94
)");
-0.94
()]
-0.94
')));
-0.93
)";
-0.92
]';
-0.91
']],
-0.88
POSITIVE LOGITS
Room
1.74
rooms
1.73
Rooms
1.69
Rooms
1.61
room
1.58
ROOM
1.56
Room
1.55
rooms
1.50
room
1.37
ROOM
1.33
Activations Density 0.028%