INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     бы
    -0.06
    лений
    -0.06
     aussi
    -0.06
     haben
    -0.06
     सत
    -0.06
     Frauen
    -0.06
     billionaires
    -0.06
     arz
    -0.06
     Interval
    -0.06
     Pavilion
    -0.06
    POSITIVE LOGITS
     plagiarism
    0.12
     plagiar
    0.10
    kg
    0.08
    jr
    0.08
    Clinical
    0.07
    	cancel
    0.07
    HomeController
    0.07
     Diabetes
    0.07
     pcm
    0.07
    Shader
    0.06
    Act Density 0.001%

    No Known Activations