INDEX
Explanations
phrases involving step-by-step instructions
phrases indicating incremental processes or steps
New Auto-Interp
Negative Logits
Hung
-0.70
Flavoring
-0.66
Mata
-0.62
Nicotine
-0.61
owed
-0.61
relocated
-0.61
Wen
-0.61
revived
-0.61
Dow
-0.59
streamed
-0.58
POSITIVE LOGITS
step
1.11
committee
1.06
tem
1.05
task
1.02
example
1.01
command
1.00
product
0.98
character
0.98
program
0.96
event
0.95
Activations Density 0.046%