INDEX
    Explanations

    mentions of preference and related terms indicating choices

    New Auto-Interp
    Negative Logits
    angelo
    -0.18
    inea
    -0.18
    quee
    -0.17
    romo
    -0.15
    ermo
    -0.15
    strap
    -0.15
    umin
    -0.15
    ish
    -0.14
    ê
    -0.14
     hen
    -0.14
    POSITIVE LOGITS
    entially
    0.44
    ential
    0.40
    ably
    0.24
    renc
    0.20
    encing
    0.19
    ENTIAL
    0.18
    .Preference
    0.18
    ensi
    0.17
    atory
    0.17
    enced
    0.17
    Act Density 0.024%

    No Known Activations