INDEX
    Explanations

    questions ending with a question mark

    rhetorical questions

    New Auto-Interp
    Negative Logits
     celebr
    -0.68
     hust
    -0.66
     proud
    -0.62
     inactive
    -0.62
     cross
    -0.60
    andi
    -0.59
     migration
    -0.59
     contagious
    -0.58
     happy
    -0.57
     upstream
    -0.56
    POSITIVE LOGITS
    Answer
    1.51
    Well
    1.05
    ³³³³
    0.96
    Yes
    0.96
    Probably
    0.95
    Solution
    0.94
    YES
    0.91
     Answer
    0.89
    ³³³³³³³³³³³³³³³³
    0.89
    Correct
    0.87
    Act Density 0.141%

    No Known Activations