INDEX
Explanations
occurrences of the word "first" and its variations in context
New Auto-Interp
Negative Logits
thing
-0.25
things
-0.22
Things
-0.18
part
-0.18
Thing
-0.18
Things
-0.18
things
-0.17
thing
-0.17
Thing
-0.17
phenomenon
-0.16
POSITIVE LOGITS
brush
0.22
stab
0.19
taste
0.19
venture
0.19
contribution
0.19
brushes
0.18
exposure
0.18
efforts
0.18
Brush
0.18
attempt
0.17
Activations Density 0.126%