INDEX
Explanations
phrases related to permission or prohibition
phrases related to permissions and restrictions
New Auto-Interp
Negative Logits
Balanced
-0.78
doomed
-0.68
Opportun
-0.67
finder
-0.66
Needs
-0.65
forcing
-0.64
nces
-0.64
Ens
-0.64
matched
-0.63
gradient
-0.62
POSITIVE LOGITS
participate
1.33
enter
1.21
partake
1.19
roam
1.18
speak
1.18
operate
1.13
attend
1.10
possess
1.09
marry
1.06
practise
1.04
Activations Density 0.147%