INDEX
Explanations
associated with physical barriers or obstacles
references to obstacles or hindrances
New Auto-Interp
Negative Logits
ivery
-0.84
iability
-0.79
phis
-0.76
sch
-0.72
inav
-0.70
ergy
-0.69
orp
-0.69
sb
-0.68
ovie
-0.68
ribution
-0.68
POSITIVE LOGITS
barriers
1.11
barrier
0.98
erected
0.96
separating
0.88
walls
0.87
Walls
0.86
Barrier
0.81
buster
0.79
wall
0.78
fence
0.73
Activations Density 0.016%