INDEX
    Explanations

    phrases starting with "What" followed by a question

    the word "What" as a questioning prompt

    New Auto-Interp
    Negative Logits
    heter
    -0.64
    atory
    -0.55
    stretched
    -0.55
    mun
    -0.54
    lique
    -0.54
    MER
    -0.53
    println
    -0.53
    general
    -0.52
    udi
    -0.52
    lined
    -0.52
    POSITIVE LOGITS
    soever
    1.31
     happens
    1.23
     Happ
    1.14
     Makes
    1.12
     constitutes
    1.05
     happened
    1.03
     Causes
    1.01
     Exactly
    0.99
     Does
    0.98
     Lies
    0.98
    Act Density 0.051%

    No Known Activations