INDEX
Explanations
phrases related to taking action or making plans
actions related to structuring, organizing, or categorizing items or concepts
New Auto-Interp
Negative Logits
terness
-0.68
udence
-0.67
imei
-0.65
edom
-0.65
plom
-0.65
perty
-0.64
xxx
-0.63
uden
-0.62
idia
-0.61
ylon
-0.61
POSITIVE LOGITS
matically
0.87
uate
0.84
oneself
0.83
ourselves
0.80
ulate
0.73
Flare
0.71
Statements
0.71
ħ
0.69
yourself
0.67
Ī
0.67
Activations Density 0.261%