INDEX
    Explanations

    phrases emphasizing a specific point or action

    the word "what" in various contexts

    New Auto-Interp
    Negative Logits
    robe
    -0.88
    ardless
    -0.74
    lich
    -0.67
    trop
    -0.64
    eer
    -0.63
    say
    -0.63
    ubs
    -0.62
    astical
    -0.62
    ways
    -0.61
    UNE
    -0.60
    POSITIVE LOGITS
     happens
    1.25
     happened
    1.21
     separates
    1.03
    soever
    1.02
     transpired
    0.96
     distinguishes
    0.89
     happ
    0.87
     motiv
    0.84
     Happ
    0.83
     bothers
    0.82
    Act Density 0.060%

    No Known Activations