INDEX
Explanations
mentions of rooms in various contexts
New Auto-Interp
Negative Logits
HER
-0.72
Uz
-0.67
Iss
-0.67
elson
-0.64
merce
-0.64
pend
-0.61
Bonds
-0.59
usterity
-0.59
FTA
-0.59
indal
-0.59
POSITIVE LOGITS
room
1.12
rooms
0.94
rooms
0.93
doors
0.92
upstairs
0.90
occupancy
0.89
room
0.86
divid
0.86
clock
0.84
itory
0.82
Activations Density 0.019%