INDEX
Explanations
references to data structures and parameters in code
New Auto-Interp
Negative Logits
']):
-1.24
'):
-1.19
']:
-1.17
'])->
-1.13
'],
-1.11
()){
-1.09
'){
-1.08
')):
-1.08
"]:
-1.07
'}>
-1.07
POSITIVE LOGITS
↵
1.28
↵↵↵
1.04
↵↵
0.80
purpoſe
0.69
</blockquote>
0.68
↵↵↵↵↵
0.67
himſelf
0.67
↵↵↵↵
0.66
<eos>
0.63
Jefus
0.60
Activations Density 0.248%