INDEX
    Explanations

    metadata and copyright related text in code

    New Auto-Interp
    Negative Logits
    .
    
    -2.91
    .");
    
    -2.61
    .";
    
    -2.56
    .]
    -2.52
    .");
    -2.50
    .}
    -2.45
    .');
    -2.45
    .";
    -2.44
    .")
    
    -2.36
    .</
    -2.34
    POSITIVE LOGITS
    <bos>
    0.74
    2
    0.66
    S
    0.66
    1
    0.63
    0.61
      
    0.60
    3
    0.60
    e
    0.59
        
    0.58
    A
    0.57
    Act Density 4.580%

    No Known Activations