INDEX
Explanations
phrases related to doing work, often of a physical or laborious nature
phrases related to undesirable or unpleasant tasks
New Auto-Interp
Negative Logits
urated
-0.70
Sett
-0.69
oor
-0.68
Bound
-0.68
Arri
-0.68
Continued
-0.67
Courage
-0.67
ãĤ¦ãĤ¹
-0.67
Returning
-0.66
eming
-0.66
POSITIVE LOGITS
grunt
0.89
chores
0.83
homework
0.80
strip
0.75
differently
0.74
offline
0.74
rehab
0.71
migrate
0.68
thing
0.68
experiment
0.68
Activations Density 0.256%