INDEX
Explanations
words related to obstacles or hindrances
references to various types of barriers
New Auto-Interp
Negative Logits
phis
-0.82
ivery
-0.81
ergy
-0.75
psc
-0.69
yrics
-0.69
sch
-0.68
opia
-0.68
largeDownload
-0.67
imbabwe
-0.67
iability
-0.67
POSITIVE LOGITS
barriers
1.31
barrier
1.15
walls
0.88
erected
0.87
separating
0.86
obstacles
0.83
Walls
0.82
Barrier
0.81
riers
0.77
buster
0.76
Activations Density 0.014%