INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     будто
    -0.07
    Labor
    -0.07
    _inverse
    -0.07
     Junior
    -0.06
                                                                            
    -0.06
    .application
    -0.06
     أمر
    -0.06
     neuken
    -0.06
    ș
    -0.06
    EndElement
    -0.06
    POSITIVE LOGITS
     spec
    0.07
     $↵
    0.06
    ?↵↵
    0.06
       ↵↵
    0.06
     reloc
    0.06
     Including
    0.06
    _plots
    0.06
    )){↵
    0.06
    )?↵↵
    0.06
    ())){↵
    0.06
    Act Density 0.009%

    No Known Activations