INDEX
    Explanations

    questions and expressions of uncertainty

    New Auto-Interp
    Negative Logits
     brook
    -0.67
    Brooks
    -0.66
    Recu
    -0.62
     BROOK
    -0.62
     castor
    -0.61
     Pollack
    -0.60
     Verge
    -0.60
     reminder
    -0.59
    COLS
    -0.59
     tráiler
    -0.58
    POSITIVE LOGITS
     what
    1.71
    what
    1.54
     WHAT
    1.54
     What
    1.53
    What
    1.50
    WHAT
    1.47
     quelles
    0.93
     wat
    0.91
    Τι
    0.89
     وما
    0.85
    Act Density 0.148%

    No Known Activations