INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    s
    0.54
     DE
    0.47
    ́
    0.46
     de
    0.46
     e
    0.45
     d
    0.44
     per
    0.42
     AL
    0.42
     of
    0.41
     mis
    0.41
    POSITIVE LOGITS
            
    0.88
        
    0.83
                
    0.77
    <unused0>
    0.77
          
    0.76
                    
    0.74
    Moreover
    0.74
    Pentru
    0.74
    <unused1966>
    0.73
    <unused1765>
    0.73
    Act Density 3.352%

    No Known Activations