INDEX
    Explanations

    instances of programming syntax and structure

    New Auto-Interp
    Negative Logits
     */
    
    -1.30
    ")]
    
    -1.23
     ");
    
    -1.20
    ";
    
    -1.17
    '];
    
    -1.17
    ');
    
    -1.16
     ";
    
    -1.16
    ");
    
    -1.15
    :");
    
    -1.10
    "));
    
    -1.10
    POSITIVE LOGITS
    3.81
    ↵↵↵
    1.14
    ↵↵
    0.93
    ↵↵↵↵
    0.91
    ↵↵↵↵↵
    0.88
    ↵↵↵↵↵↵↵
    0.82
    ↵↵↵↵↵↵
    0.80
    ↵↵↵↵↵↵↵↵
    0.73
    ↵↵↵↵↵↵↵↵↵
    0.71
    ↵↵↵↵↵↵↵↵↵↵↵
    0.69
    Act Density 17.586%

    No Known Activations