INDEX
    Explanations

    phrases indicating difficulty or challenges

    New Auto-Interp
    Negative Logits
    ds
    -0.17
    ature
    -0.15
    £
    -0.15
    hl
    -0.14
    ayi
    -0.14
    atures
    -0.14
     sooner
    -0.14
     armed
    -0.14
    bout
    -0.13
    dl
    -0.13
    POSITIVE LOGITS
    idf
    0.15
    è̶
    0.15
    smith
    0.14
    antan
    0.14
    immers
    0.14
    acre
    0.14
    reich
    0.14
    745
    0.14
    ynn
    0.14
    otime
    0.14
    Act Density 0.043%

    No Known Activations