INDEX
    Explanations

    rhetorical questions that express curiosity or challenge beliefs

    New Auto-Interp
    Negative Logits
     Haller
    -0.75
    $​
    -0.71
    intahan
    -0.70
     Cortes
    -0.70
    Hauptartikel
    -0.69
     ")[
    -0.67
    INSTALLED
    -0.67
     .\
    -0.66
    PEZ
    -0.66
     Schot
    -0.66
    POSITIVE LOGITS
     Whyte
    1.60
    why
    1.53
     why
    1.50
    Why
    1.48
     Why
    1.43
     WHY
    1.37
    WHY
    1.35
     Warum
    1.27
     Waarom
    1.23
     pourquoi
    1.17
    Act Density 0.041%

    No Known Activations