INDEX
    Explanations

    expressions of uncertainty or subjective opinions

    preceding explanations or justifications

    explaining cause or reason

    New Auto-Interp
    Negative Logits
    findpost
    -0.82
    abestanden
    -0.81
     itſelf
    -0.69
     мәкал
    -0.69
    Obrázky
    -0.67
     Maho
    -0.66
     Efq
    -0.64
    OGND
    -0.64
     Jefus
    -0.64
     ویکی‌پدیای
    -0.63
    POSITIVE LOGITS
     because
    1.16
     due
    1.00
    because
    0.94
     porque
    0.89
     karena
    0.86
    Because
    0.83
    的原因
    0.80
     потому
    0.79
    是因為
    0.79
     Because
    0.78
    Act Density 0.339%

    No Known Activations