INDEX
    Explanations

    phrases related to difficulty, urgency, and consequences

    phrases expressing difficulty and challenges

    New Auto-Interp
    Negative Logits
     untouched
    -0.56
     excav
    -0.53
     distinctive
    -0.51
     searched
    -0.51
     existed
    -0.51
     assimil
    -0.51
     annot
    -0.50
     extensively
    -0.50
     outper
    -0.50
     underrated
    -0.49
    POSITIVE LOGITS
     consolation
    0.66
     coincidence
    0.64
    ourt
    0.64
    farious
    0.58
     semantics
    0.57
     inev
    0.57
     Pyr
    0.56
     hindsight
    0.56
    ayers
    0.55
    cakes
    0.53
    Act Density 0.466%

    No Known Activations