INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Sch
    -0.08
    Red
    -0.08
     koron
    -0.07
     feats
    -0.07
     Sch
    -0.07
     graphs
    -0.07
    sch
    -0.07
     outperform
    -0.07
    rein
    -0.07
     Scaffold
    -0.07
    POSITIVE LOGITS
     متابعة
    0.09
    .substring
    0.08
    .trim
    0.08
     whitespace
    0.08
     clay
    0.08
     unserem
    0.08
     ,,
    0.08
     غلام
    0.08
     zuiden
    0.08
     separated
    0.08
    Act Density 0.007%

    No Known Activations