INDEX
Explanations
expressions of frustration or difficulty with problem-solving
New Auto-Interp
Negative Logits
nger
-0.15
achs
-0.15
ầm
-0.15
yscale
-0.15
ATRIX
-0.14
ÙĪÙħÛĮ
-0.14
Cousins
-0.14
.BLL
-0.13
996
-0.13
veis
-0.13
POSITIVE LOGITS
nada
0.25
STILL
0.19
nothing
0.19
results
0.19
Still
0.19
success
0.17
luck
0.17
Nothing
0.17
still
0.17
improvement
0.17
Activations Density 0.117%