INDEX
Explanations
prepositions followed by an action or description
phrases indicating limitations or lack of significance
New Auto-Interp
Negative Logits
Dane
-0.71
cember
-0.69
hires
-0.67
jan
-0.65
士
-0.64
pregnancies
-0.64
layoffs
-0.63
salaries
-0.62
motions
-0.60
visc
-0.60
POSITIVE LOGITS
satisfy
1.02
asted
0.94
celebrate
0.89
adies
0.88
ggles
0.87
relieve
0.86
pload
0.85
appease
0.85
justify
0.84
reinforce
0.83
Activations Density 0.111%