INDEX
    Explanations

    often treated, within contexts

    New Auto-Interp
    Negative Logits
    iaire
    0.46
     پردا
    0.45
    ille
    0.45
     دارند
    0.45
    ea
    0.44
     exhibited
    0.44
     تب
    0.44
    ués
    0.43
     ብዙውን
    0.43
    èves
    0.43
    POSITIVE LOGITS
    0.48
     pressing
    0.46
     기본적인
    0.41
     garantia
    0.41
    친구
    0.40
     compromising
    0.40
    mannschaft
    0.39
    Para
    0.39
    football
    0.39
    人を
    0.38
    Act Density 0.002%

    No Known Activations