INDEX
    Explanations

    assuming and stating assumptions

    New Auto-Interp
    Negative Logits
    真正
    0.41
     literalmente
    0.39
    Nunca
    0.38
    выше
    0.38
     addirittura
    0.37
     esattamente
    0.37
    DoesNotExist
    0.37
     autrefois
    0.37
    早就
    0.36
    まさに
    0.36
    POSITIVE LOGITS
     assume
    2.03
    assume
    1.95
     assumes
    1.92
     assumed
    1.91
     Assume
    1.91
     assuming
    1.90
    Assume
    1.80
     Assuming
    1.80
    Assuming
    1.75
    假设
    1.74
    Act Density 0.089%

    No Known Activations