INDEX
    Explanations

    references to well-known controversies or challenges

    New Auto-Interp
    Negative Logits
    olon
    -0.19
    avra
    -0.18
    tha
    -0.15
    uges
    -0.15
    adle
    -0.14
    acic
    -0.14
    ostel
    -0.14
    erif
    -0.14
    uktur
    -0.14
    elic
    -0.14
    POSITIVE LOGITS
     appear
    0.20
     stand
    0.19
     seem
    0.18
     ideal
    0.17
     especially
    0.17
     easier
    0.17
     overall
    0.17
    appear
    0.16
     feel
    0.15
     susceptible
    0.15
    Act Density 0.057%

    No Known Activations