INDEX
    Explanations

    variations in formatting, particularly whitespace and formatting symbols

    New Auto-Interp
    Negative Logits
    NUMX
    -1.61
     CreateTagHelper
    -1.52
    AddTagHelper
    -1.51
     ComVisible
    -1.49
    __":
    
    -1.45
    ftagPool
    -1.41
    =$?
    -1.38
    المناصب
    -1.38
    __':
    
    -1.37
    SharedDtor
    -1.35
    POSITIVE LOGITS
      
    0.94
    0.86
     is
    0.75
     are
    0.73
    ↵↵
    0.66
    s
    0.66
     was
    0.65
    The
    0.65
     will
    0.64
    <eos>
    0.64
    Act Density 0.004%

    No Known Activations