INDEX
    Explanations

    consistent phrases indicating similarity or uniformity across different contexts

    New Auto-Interp
    Negative Logits
     ligiloj
    -0.61
     يتيمه
    -0.54
    principalTable
    -0.50
    oredCriteria
    -0.49
    '){
    
    -0.49
    ThroughAttribute
    -0.48
     TestBed
    -0.47
    ulum
    -0.46
    )*/
    -0.46
    )';
    -0.46
    POSITIVE LOGITS
     unmodified
    0.67
     iguales
    0.66
    forall
    0.64
    Iden
    0.64
     identical
    0.61
     scolaires
    0.61
     seragam
    0.60
    addContainerGap
    0.60
     unchanged
    0.60
     geblieben
    0.60
    Act Density 0.529%

    No Known Activations