INDEX
    Explanations

    words related to difficulty or challenges

    New Auto-Interp
    Negative Logits
    <unused68>
    -0.94
    <unused8>
    -0.94
    <unused3>
    -0.94
    [@BOS@]
    -0.94
    <unused52>
    -0.94
    <unused79>
    -0.94
    <unused28>
    -0.93
    <unused41>
    -0.93
    <unused14>
    -0.93
    <pad>
    -0.93
    POSITIVE LOGITS
    EventHandler
    0.50
    0.38
    <em>
    0.36
     Water
    0.35
    ↵↵
    0.35
    util
    0.34
     useState
    0.34
     "[
    0.34
    Phi
    0.33
    Util
    0.32
    Act Density 0.272%

    No Known Activations