INDEX
Explanations
concepts related to obstacles or restrictions
references to obstacles or impediments in various contexts
New Auto-Interp
Negative Logits
phis
-0.85
ivery
-0.84
ergy
-0.73
ories
-0.73
sch
-0.73
iability
-0.71
lihood
-0.70
ammad
-0.70
ribution
-0.69
doms
-0.68
POSITIVE LOGITS
barriers
1.17
barrier
1.03
erected
1.00
walls
0.92
separating
0.90
buster
0.85
Walls
0.81
wall
0.79
crossings
0.79
Barrier
0.76
Activations Density 0.017%