INDEX
    Explanations

    phrases related to reasons or justifications

    references to individuals and their actions or situations

    New Auto-Interp
    Negative Logits
    ----------------------------------------------------------------
    -0.70
    ortun
    -0.66
     nutshell
    -0.66
    GGGGGGGG
    -0.65
    UGE
    -0.64
    venge
    -0.64
     renaissance
    -0.63
     unveiling
    -0.63
    --------------------------------
    -0.62
     toget
    -0.62
    POSITIVE LOGITS
     lacked
    1.58
     hadn
    1.55
     disagreed
    1.32
     objected
    1.26
     refused
    1.25
     wasn
    1.23
     lacks
    1.23
     feared
    1.19
     doubted
    1.17
     didn
    1.16
    Act Density 0.370%

    No Known Activations