INDEX
    Explanations

    statements of belief, opinion, or claims about events or attributes

    New Auto-Interp
    Negative Logits
    898
    -0.16
    yre
    -0.15
    usta
    -0.14
    ftware
    -0.14
    ettle
    -0.14
    alet
    -0.14
    /start
    -0.14
    δα
    -0.13
    iki
    -0.13
    -consuming
    -0.13
    POSITIVE LOGITS
     to
    0.22
     capable
    0.20
    ly
    0.19
    anced
    0.18
    edly
    0.17
     responsible
    0.16
    likely
    0.16
    worthy
    0.16
     likely
    0.16
     Likely
    0.15
    Act Density 0.066%

    No Known Activations