INDEX
    Explanations

    structural elements in programming code

    code blocks and syntax</div>

    New Auto-Interp
    Negative Logits
    .
    -0.43
     \
    -0.41
    -0.37
    >
    -0.36
    \
    -0.35
    ↵↵
    -0.33
    ↵↵↵
    -0.33
     (
    -0.31
    <bos>
    -0.31
     {
    -0.30
    POSITIVE LOGITS
    "):
    
    1.38
    "]);
    
    1.38
    "];
    
    1.34
    '>
    
    1.34
    '])){
    
    1.34
    ]:
    
    1.33
    '):
    
    1.31
     ]
    
    1.30
    "]
    
    1.29
    ")]
    
    1.27
    Act Density 0.004%

    No Known Activations