INDEX
    Explanations

    sentences indicating insights, unraveling mysteries, or achieving deep understanding

    New Auto-Interp
    Negative Logits
    amon
    -0.69
    sequently
    -0.68
    )",
    -0.68
    tones
    -0.67
    igree
    -0.65
    Others
    -0.65
    wick
    -0.64
     occasion
    -0.63
    Scroll
    -0.62
    )"
    -0.62
    POSITIVE LOGITS
     goddamn
    0.88
    enegger
    0.75
    /(
    0.73
     fucking
    0.73
     overest
    0.72
    BILITY
    0.72
     damn
    0.69
     willfully
    0.68
     reinvent
    0.68
     fucked
    0.67
    Act Density 0.726%

    No Known Activations