INDEX
    Explanations

    phrases that introduce a statement or argument

    occurrences of the word "that" in various contexts

    New Auto-Interp
    Negative Logits
    aukee
    -0.71
    en
    -0.69
    backer
    -0.68
    gallery
    -0.68
    ien
    -0.65
    raq
    -0.64
    Guard
    -0.63
    wn
    -0.62
    anie
    -0.62
    AMY
    -0.61
    POSITIVE LOGITS
     they
    0.75
     contradicts
    0.75
     '[
    0.73
     justifies
    0.70
     witches
    0.69
     "#
    0.69
     "[
    0.69
     someday
    0.67
     we
    0.65
     somehow
    0.64
    Act Density 0.269%

    No Known Activations