INDEX
Explanations
references to physical structures like bridges
instances of the word "bridge."
New Auto-Interp
Negative Logits
arily
-0.92
iaries
-0.74
resy
-0.73
matically
-0.70
atically
-0.68
eal
-0.65
ILY
-0.64
Policy
-0.64
arios
-0.63
psy
-0.62
POSITIVE LOGITS
port
1.00
bridge
0.95
bridges
0.90
bridge
0.87
Strait
0.84
roads
0.84
Bridges
0.83
chairs
0.81
layer
0.80
ports
0.78
Activations Density 0.026%