INDEX
Explanations
sentences that contain the word "would" followed by some action
conditional expressions or hypothetical scenarios
New Auto-Interp
Negative Logits
ige
-0.53
oken
-0.52
yours
-0.50
redients
-0.49
resa
-0.49
ryn
-0.49
Learn
-0.48
isible
-0.48
allo
-0.48
Discover
-0.48
POSITIVE LOGITS
would
2.81
would
2.56
wouldn
2.26
Would
2.03
could
1.84
Would
1.80
'd
1.76
might
1.66
could
1.65
Wouldn
1.62
Activations Density 0.159%