INDEX
Explanations
function calls and their parameters in code
New Auto-Interp
Negative Logits
stus
-0.83
ment
-0.78
5
-0.77
z
-0.73
Hel
-0.73
El
-0.72
Zie
-0.71
el
-0.70
Zel
-0.69
flag
-0.69
POSITIVE LOGITS
__":
1.09
}(),
1.07
()");
0.98
}))
0.96
()])
0.96
)()
0.94
())
0.93
().
0.93
Monfieur
0.93
()):
0.93
Activations Density 0.106%