INDEX
    Explanations

    questions, particularly starting with the word "How"

    New Auto-Interp
    Negative Logits
    room
    -0.62
    piece
    -0.61
    odder
    -0.60
    article
    -0.59
    oubted
    -0.57
     hereafter
    -0.57
    iculture
    -0.56
     Issue
    -0.56
    agonists
    -0.56
    goers
    -0.56
    POSITIVE LOGITS
    soever
    1.10
    beit
    0.97
    ever
    0.95
    ells
    0.91
    itzer
    0.90
    ling
    0.88
     much
    0.87
    ls
    0.86
     exactly
    0.75
    much
    0.75
    Act Density 1.164%

    No Known Activations