INDEX
Explanations
phrases related to physical interactions, control, or authority
phrases related to power dynamics and hierarchical positions
New Auto-Interp
Negative Logits
ratulations
-0.75
ortium
-0.66
phis
-0.66
avorite
-0.65
sylv
-0.65
eur
-0.63
theless
-0.59
spores
-0.59
valuable
-0.58
izoph
-0.57
POSITIVE LOGITS
behest
1.06
helm
0.86
altar
0.83
expense
0.80
outset
0.80
urging
0.79
discretion
0.79
table
0.78
periphery
0.77
mercy
0.73
Activations Density 0.123%