INDEX
    Explanations

    instances of formatting or structural markers in texts

    New Auto-Interp
    Negative Logits
     ProtoMessage
    -1.22
    ReusableCell
    -1.19
    __":
    
    -1.19
    bootstrapcdn
    -1.13
    تقاوى
    -1.09
    دانشنامهٔ
    -1.08
    IsContent
    -1.07
    __':
    
    -1.05
    GeoNames
    -1.05
    InjectAttribute
    -1.04
    POSITIVE LOGITS
    0.91
      
    0.86
    [toxicity=0]
    0.85
    //
    0.73
       
    0.71
        
    0.70
    .
    0.68
         
    0.67
    ?
    0.67
    	
    0.66
    Act Density 0.120%

    No Known Activations