INDEX
    Explanations

    phrases indicating preference or habitual choices

    New Auto-Interp
    Negative Logits
     Opport
    -0.17
    imbus
    -0.16
    curring
    -0.15
    omik
    -0.15
    pls
    -0.14
     Geg
    -0.14
     PlzeÅĪ
    -0.14
    267
    -0.14
    onian
    -0.14
    istically
    -0.14
    POSITIVE LOGITS
    -to
    0.28
     go
    0.27
    -go
    0.25
    -To
    0.23
    (go
    0.19
    thic
    0.18
    go
    0.18
    oose
    0.17
    Go
    0.17
    .go
    0.16
    Act Density 0.031%

    No Known Activations