INDEX
    Explanations

    contractions: 's, 't, 're, 've

    New Auto-Interp
    Negative Logits
        
    0.70
    0.66
       
    0.64
          
    0.62
            
    0.61
           
    0.60
    0.56
    	
    0.55
             
    0.55
                
    0.54
    POSITIVE LOGITS
    <unused657>
    0.79
    >∕
    0.78
    <unused1144>
    0.78
    <unused160>
    0.75
    <unused1172>
    0.73
    <unused674>
    0.72
    0.72
    0.71
    <unused2105>
    0.71
    <unused600>
    0.71
    Act Density 0.083%

    No Known Activations