INDEX
Explanations
code comments with following action
New Auto-Interp
Negative Logits
`
1.20
1.13
`.
1.07
`,
1.04
↵↵
0.98
”.
0.98
:
0.97
”
0.95
`:
0.93
`).
0.92
POSITIVE LOGITS
---------
1.68
------
1.66
----
1.64
-----
1.64
======
1.64
-------
1.60
----------
1.55
=====
1.53
--------
1.51
-----------
1.50
Activations Density 0.167%