INDEX
Explanations
phrases related to steps or actions needed to achieve a specific goal
phrases related to instructions or guidelines for achieving tasks
New Auto-Interp
Negative Logits
outweigh
-0.78
outwe
-0.68
drowned
-0.67
marine
-0.65
benches
-0.65
alive
-0.63
burning
-0.63
quot
-0.62
pez
-0.62
harb
-0.61
POSITIVE LOGITS
aucus
0.73
itialized
0.68
nutshell
0.66
Brief
0.65
ertain
0.64
recap
0.64
zbek
0.64
meantime
0.63
disclaimer
0.62
kinson
0.62
Activations Density 0.121%