INDEX
Explanations
phrases related to prevention or obstruction
phrases indicating prohibition or prevention
New Auto-Interp
Negative Logits
aic
-0.72
ector
-0.70
rote
-0.69
wait
-0.68
elman
-0.66
arse
-0.66
abre
-0.66
olitical
-0.64
ety
-0.63
lyak
-0.63
POSITIVE LOGITS
accessing
1.50
entering
1.40
reaching
1.38
harming
1.38
obtaining
1.35
interfering
1.34
achieving
1.31
completing
1.30
joining
1.30
gaining
1.29
Activations Density 0.068%