INDEX
    Explanations

    phrases that introduce information or present conclusions

    clauses or phrases that introduce defining characteristics or explanations

    New Auto-Interp
    Negative Logits
    ugu
    -0.76
    aq
    -0.73
    oug
    -0.64
    iq
    -0.63
    ahime
    -0.63
    UG
    -0.61
    ablish
    -0.59
    MQ
    -0.59
    roying
    -0.58
    hent
    -0.58
    POSITIVE LOGITS
     horr
    1.14
     extends
    0.94
     accompanies
    0.90
     ought
    0.89
     encompasses
    0.89
     haun
    0.87
     culmin
    0.86
     echoes
    0.86
     coincides
    0.85
     occurs
    0.85
    Act Density 0.145%

    No Known Activations