INDEX
    Explanations

    mentions of "most" related to quantitative evaluations

    New Auto-Interp
    Negative Logits
    rompt
    -0.85
    vest
    -0.83
    icer
    -0.79
    heid
    -0.79
    pload
    -0.79
     Mellon
    -0.76
    instead
    -0.75
    alid
    -0.72
    adium
    -0.70
    thur
    -0.70
    POSITIVE LOGITS
     importantly
    1.35
    afa
    0.96
     notably
    0.96
    body
    0.93
    rar
    0.92
     important
    0.89
    likely
    0.88
     likely
    0.86
     observers
    0.85
     egreg
    0.83
    Act Density 13.629%

    No Known Activations