INDEX
Explanations
references to physical doors
references to various types of doors
New Auto-Interp
Negative Logits
ollah
-0.71
TING
-0.70
Hots
-0.68
udeb
-0.68
uke
-0.67
lihood
-0.67
ousand
-0.66
ting
-0.65
PubMed
-0.64
ovych
-0.64
POSITIVE LOGITS
bell
1.27
door
1.21
steps
1.20
doors
1.05
doors
1.00
plates
0.98
posts
0.96
ways
0.92
holes
0.92
opener
0.92
Activations Density 0.029%