INDEX
Explanations
phrases involving making choices between different options
phrases related to choices or options
New Auto-Interp
Negative Logits
idth
-0.85
OGR
-0.83
tsy
-0.78
wikipedia
-0.78
gow
-0.77
rak
-0.76
brate
-0.74
riz
-0.74
der
-0.72
oÄŁ
-0.71
POSITIVE LOGITS
halves
0.85
genders
0.75
them
0.72
competing
0.70
two
0.67
sexes
0.67
extremes
0.66
scarce
0.66
genres
0.62
Anarchy
0.62
Activations Density 0.038%