INDEX
Explanations
phrases related to physical barriers or obstacles
references to barriers, both physical and metaphorical
New Auto-Interp
Negative Logits
ivery
-0.90
phis
-0.82
matically
-0.74
ories
-0.73
ergy
-0.72
lihood
-0.72
sch
-0.71
ribution
-0.71
ores
-0.69
aleb
-0.68
POSITIVE LOGITS
barriers
1.11
barrier
1.03
erected
0.97
separating
0.92
walls
0.88
buster
0.85
Barrier
0.80
wall
0.80
Walls
0.80
crossings
0.77
Activations Density 0.032%