INDEX
Explanations
terms related to specific goals or objectives
phrases indicating intentions or objectives
New Auto-Interp
Negative Logits
assian
-0.75
wine
-0.65
mson
-0.65
bush
-0.65
condition
-0.65
minus
-0.64
hours
-0.64
orce
-0.63
involved
-0.63
imore
-0.63
POSITIVE LOGITS
-+-+
0.71
overturn
0.67
topp
0.66
perfection
0.65
ãĥĦ
0.65
apprehend
0.65
achieving
0.64
inflicting
0.64
accompl
0.64
ä½ľ
0.63
Activations Density 0.326%