INDEX
    Explanations

    phrases indicating responses to various situations

    New Auto-Interp
    Negative Logits
    LOCKS
    -0.17
    ernet
    -0.15
    icago
    -0.15
    ÅĻet
    -0.14
    ecture
    -0.14
    vet
    -0.14
    WISE
    -0.14
     Pulse
    -0.14
    loth
    -0.14
    icious
    -0.14
    POSITIVE LOGITS
    /response
    0.21
    ivate
    0.19
    (Response
    0.18
    =response
    0.18
    ToSelector
    0.18
    <|begin_of_text|>
    0.17
    -response
    0.17
    .Response
    0.17
     Drag
    0.16
     Response
    0.16
    Act Density 0.067%

    No Known Activations