INDEX
Explanations
words related to difficulty or challenges
New Auto-Interp
Negative Logits
<unused68>
-0.94
<unused8>
-0.94
<unused3>
-0.94
[@BOS@]
-0.94
<unused52>
-0.94
<unused79>
-0.94
<unused28>
-0.93
<unused41>
-0.93
<unused14>
-0.93
<pad>
-0.93
POSITIVE LOGITS
EventHandler
0.50
↵
0.38
<em>
0.36
Water
0.35
↵↵
0.35
util
0.34
useState
0.34
"[
0.34
Phi
0.33
Util
0.32
Activations Density 0.272%