INDEX
    Explanations

    expressions of preference

    expressions of preference

    New Auto-Interp
    Negative Logits
    eval
    -0.74
    idem
    -0.73
    Article
    -0.73
    ammy
    -0.73
    pack
    -0.71
    Chapter
    -0.70
    Americ
    -0.70
    Impl
    -0.69
    angers
    -0.68
    chapter
    -0.67
    POSITIVE LOGITS
    yip
    0.80
    ably
    0.79
    lihood
    0.76
     preferring
    0.75
    swer
    0.72
     prefers
    0.70
     favoured
    0.70
    ancy
    0.69
     pse
    0.69
    itism
    0.68
    Act Density 0.009%

    No Known Activations