INDEX
    Explanations

    questions or comparisons asking to choose between different options

    questions that ask for preferences or choices across various contexts

    New Auto-Interp
    Negative Logits
    krit
    -0.80
    zar
    -0.77
    ruce
    -0.76
    fty
    -0.75
    comings
    -0.73
    anyahu
    -0.70
     Corona
    -0.69
    vity
    -0.67
    staking
    -0.67
    figure
    -0.67
    POSITIVE LOGITS
    ?",
    0.79
     ident
    0.76
     appropri
    0.71
     desired
    0.70
     domin
    0.69
    opter
    0.68
     accordingly
    0.68
    ?'"
    0.68
    ?),
    0.67
     deserving
    0.66
    Act Density 0.284%

    No Known Activations