INDEX
    Explanations

    examples or instances of behaviors or characteristics

    instances of examples being cited in various contexts

    New Auto-Interp
    Negative Logits
    Enlarge
    -0.74
    hunt
    -0.72
    ettes
    -0.70
    emies
    -0.68
    forts
    -0.67
    EEP
    -0.65
    task
    -0.65
    querade
    -0.65
    agues
    -0.65
    culosis
    -0.64
    POSITIVE LOGITS
     how
    1.37
     why
    1.29
     what
    0.97
     hypocrisy
    0.94
     unintended
    0.89
    why
    0.88
    how
    0.87
     lazy
    0.87
     WHY
    0.83
     misplaced
    0.82
    Act Density 0.115%

    No Known Activations