INDEX
    Explanations
    New Auto-Interp
    Negative Logits
      
    -0.24
    <eos>
    -0.22
        
    -0.20
    L
    -0.18
     mar
    -0.18
     trans
    -0.16
    </td>
    -0.16
                
    -0.16
     contre
    -0.15
     foreign
    -0.15
    POSITIVE LOGITS
    <unused8>
    1.08
    <unused52>
    1.08
    <unused28>
    1.08
    <unused51>
    1.08
    <unused14>
    1.08
    <pad>
    1.08
    <unused16>
    1.08
    <unused17>
    1.07
    <unused3>
    1.07
    [@BOS@]
    1.07
    Act Density 0.005%

    No Known Activations