INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ruchu
    -0.80
     recibe
    -0.77
    ネート
    -0.76
     előtt
    -0.75
    ceptible
    -0.74
    мови
    -0.73
    msen
    -0.72
     ਨੂੰ
    -0.71
    Է
    -0.70
    -0.70
    POSITIVE LOGITS
     behave
    4.88
     behaving
    4.41
     behaves
    4.41
     acting
    4.16
     behaved
    4.09
     act
    4.00
     acted
    3.80
    acting
    3.38
     acts
    3.25
    Acting
    3.11
    Act Density 0.089%

    No Known Activations