INDEX
Explanations
references to notable buildings and tourist attractions
New Auto-Interp
Negative Logits
heard
-0.15
Sink
-0.15
gren
-0.14
odial
-0.14
allah
-0.14
Bounds
-0.14
downstream
-0.14
unte
-0.14
Alley
-0.13
torpedo
-0.13
POSITIVE LOGITS
observation
0.33
platform
0.29
rooftop
0.28
platforms
0.28
lookout
0.28
tower
0.27
observation
0.27
Observation
0.27
viewing
0.27
viewpoint
0.27
Activations Density 0.170%