INDEX
    Explanations

    phrases starting with "wh" followed by a space

    occurrences of the substring "wh"

    New Auto-Interp
    Negative Logits
     advance
    -0.71
     Stra
    -0.70
    interstitial
    -0.68
    bed
    -0.68
     Grande
    -0.66
    atory
    -0.64
     Erdogan
    -0.63
     Bach
    -0.62
     Manual
    -0.61
     Kub
    -0.61
    POSITIVE LOGITS
    ilst
    1.22
    soever
    1.12
    istle
    1.07
    ispers
    1.06
    olly
    1.04
    orf
    0.97
    ocom
    0.95
    urst
    0.94
    atson
    0.93
    irlwind
    0.91
    Act Density 0.005%

    No Known Activations