INDEX
    Explanations

    the word "preferred" in a sentence

    references to choices or preferences

    New Auto-Interp
    Negative Logits
    ????????
    -0.75
     Tours
    -0.67
    circle
    -0.67
    humans
    -0.67
    ences
    -0.67
    lifting
    -0.66
    iry
    -0.65
     Revival
    -0.63
     Unlock
    -0.62
    corruption
    -0.61
    POSITIVE LOGITS
     preferred
    3.93
     favoured
    2.15
     desired
    1.95
     Preferred
    1.95
     favored
    1.94
     preference
    1.93
     preferable
    1.90
     prefers
    1.77
     prefer
    1.74
     disliked
    1.64
    Act Density 0.008%

    No Known Activations