INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    on
    0.98
    יות
    0.91
     veya
    0.90
    mActivity
    0.89
     nakon
    0.86
     børn
    0.84
    šanas
    0.84
    Τ
    0.83
    Οι
    0.83
    zelfde
    0.83
    POSITIVE LOGITS
     doomed
    1.03
     impunity
    0.99
     ensured
    0.99
     immediacy
    0.97
     stout
    0.95
     confounding
    0.94
     watercolor
    0.92
     drawn
    0.92
    0.92
     outlined
    0.91
    Act Density 0.158%

    No Known Activations