INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     también
    -0.97
     επίσης
    -0.88
     também
    -0.87
     also
    -0.86
     पनि
    -0.85
     Bad
    -0.85
    Also
    -0.85
     agree
    -0.84
    oved
    -0.83
     Le
    -0.82
    POSITIVE LOGITS
     will
    1.52
     apologize
    1.19
     apologies
    1.15
     won
    1.10
     wont
    1.09
     opět
    1.07
     mentioned
    1.05
     hesitated
    1.05
     nebude
    1.04
     shall
    1.03
    Act Density 0.069%

    No Known Activations