INDEX
Explanations
phrases related to doors
occurrences of the word "door"
New Auto-Interp
Negative Logits
Hots
-0.71
enegger
-0.69
TY
-0.67
udeb
-0.67
ovych
-0.65
azeera
-0.65
lihood
-0.64
ontent
-0.64
TING
-0.64
ciating
-0.62
POSITIVE LOGITS
bell
1.27
door
1.23
steps
1.09
holes
1.07
doors
1.02
frame
0.96
doors
0.94
door
0.94
Door
0.91
hole
0.90
Activations Density 0.015%