INDEX
    Explanations

    phrases expressing uncertainty or conditionality

    New Auto-Interp
    Negative Logits
     voted
    -0.79
     vote
    -0.76
     Vote
    -0.72
     فريبيس
    -0.71
     Voting
    -0.66
    Vote
    -0.66
     voting
    -0.65
     VOTE
    -0.64
    vote
    -0.64
    Voting
    -0.61
    POSITIVE LOGITS
     Literally
    1.84
    literally
    1.78
     literally
    1.76
    Literally
    1.64
     literalmente
    1.55
     literal
    1.55
     figur
    1.33
     pun
    1.30
    literal
    1.27
     metaphor
    1.26
    Act Density 0.161%

    No Known Activations