INDEX
    Explanations

    phrases related to reasoning or explanation

    statements or phrases that indicate causation or reasons

    New Auto-Interp
    Negative Logits
    stad
    -0.72
    ievers
    -0.68
    mson
    -0.67
    dar
    -0.66
    aux
    -0.66
    iard
    -0.64
    ature
    -0.64
    urch
    -0.64
    aven
    -0.63
    ults
    -0.63
    POSITIVE LOGITS
     undoubtedly
    1.03
     doubtless
    0.95
     obvious
    0.94
     sheer
    0.93
     attributable
    0.83
     undeniable
    0.82
     simplicity
    0.81
     probably
    0.80
     evident
    0.79
     unavoidable
    0.77
    Act Density 0.103%

    No Known Activations