INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ولي
    -0.07
    ˉ
    -0.07
     Wax
    -0.07
     pitch
    -0.07
     ראיתי
    -0.07
     أحمد
    -0.07
     Wallace
    -0.06
     Walls
    -0.06
     avere
    -0.06
     Möglich
    -0.06
    POSITIVE LOGITS
    _Private
    0.07
    Ί
    0.07
    ventional
    0.07
    loor
    0.07
     ------------------------------------------------
    0.06
    0.06
     travelling
    0.06
    0.06
     engagements
    0.06
    agation
    0.06
    Act Density 0.025%

    No Known Activations