INDEX
    Explanations

    specific trigger words like "When"

    occurrences of the word "When."

    New Auto-Interp
    Negative Logits
    kaya
    -0.88
    whatever
    -0.74
    SPONSORED
    -0.72
    \\\\\\\\
    -0.67
    oof
    -0.67
    Í
    -0.67
    OTHER
    -0.66
    bart
    -0.66
    ding
    -0.66
    aking
    -0.66
    POSITIVE LOGITS
     asked
    1.28
     confronted
    1.17
    soever
    1.10
     pressed
    1.05
     faced
    1.03
     contacted
    0.96
     discussing
    0.95
     questioned
    0.95
     comparing
    0.91
    ce
    0.89
    Act Density 0.080%

    No Known Activations