INDEX
Explanations
references to garage doors and their mechanisms
New Auto-Interp
Negative Logits
wort
-0.19
ered
-0.17
ountains
-0.16
ее
-0.14
.hl
-0.14
conde
-0.14
firefight
-0.14
帽
-0.14
Jug
-0.13
еÑĢж
-0.13
POSITIVE LOGITS
garage
0.38
Garage
0.35
opener
0.33
gar
0.31
Gar
0.30
gar
0.24
GAR
0.23
Gar
0.22
tors
0.21
Remote
0.20
Activations Density 0.008%