INDEX
Explanations
references to doors
references to doors
New Auto-Interp
Negative Logits
Hots
-0.66
abi
-0.65
azeera
-0.65
ting
-0.65
ciating
-0.65
umbnail
-0.63
ortium
-0.62
ontent
-0.62
thood
-0.61
ency
-0.60
POSITIVE LOGITS
door
1.24
bell
1.15
steps
1.06
holes
1.04
doors
1.00
door
0.99
hole
0.95
opener
0.95
Door
0.95
doors
0.91
Activations Density 0.012%