INDEX
    Explanations

    patterns or structures within text data, particularly focusing on sequences or block elements

    New Auto-Interp
    Negative Logits
    featureID
    -0.80
    UserScript
    -0.61
    ThroughAttribute
    -0.59
    moveToFirst
    -0.58
    WebServlet
    -0.56
     للاسماء
    -0.56
    RUnlock
    -0.55
    setWeight
    -0.54
    RTLU
    -0.54
    GEBURTS
    -0.54
    POSITIVE LOGITS
          
    1.17
           
    1.15
       
    1.15
              
    1.12
                      
    1.09
                          
    0.93
                       
    0.92
             
    0.92
                  
    0.88
                   
    0.85
    Act Density 0.334%

    No Known Activations