INDEX
    Explanations

    Poor grammar or translations

    New Auto-Interp
    Negative Logits
     plagiarism
    -0.08
    urther
    -0.07
     spell
    -0.07
    unod
    -0.07
    -0.07
    postcode
    -0.07
     nona
    -0.07
    忘初心
    -0.07
    alwa
    -0.07
     \$
    -0.07
    POSITIVE LOGITS
        
    0.11
            
    0.11
     ​​
    0.10
      
    0.10
    ​​​​
    0.09
    0.09
      
    0.09
       
    0.09
       
    0.09
    ​​
    0.09
    Act Density 0.367%

    No Known Activations