INDEX
    Explanations

    dialogue turns or conversational openings

    New Auto-Interp
    Negative Logits
     [](
    -0.09
     lav
    -0.09
    Cfg
    -0.09
     shl
    -0.09
     shit
    -0.09
    åijĢ
    -0.09
    eus
    -0.09
    åĻ
    -0.08
    .aws
    -0.08
     impl
    -0.08
    POSITIVE LOGITS
     fine
    0.13
    fine
    0.12
     Fine
    0.12
    Fine
    0.11
    FINE
    0.10
    You
    0.10
     bunch
    0.10
     daring
    0.09
    "You
    0.09
     Hey
    0.09
    Act Density 0.067%

    No Known Activations